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Preface 



Although diagrammatic representations have been a feature of human commu- 
nication from early history, recent advances in printing and electronic media 
technology have introduced increasingly sophisticated visual representations into 
everyday life. We need to improve our understanding of the role of diagrams and 
sketches in communication, cognition, creative thought, and problem-solving. 
These concerns have triggered a surge of interest in the study of diagrammatic 
notations, especially in academic disciplines dealing with cognition, computa- 
tion, and communication. 

We believe that the study of diagrammatic communication is best pursued 
as an interdisciplinary endeavor. The Diagrams conference series was launched 
to support an international research community with this common goal. After 
successful meetings in Edinburgh (2000) and Georgia (2002), Diagrams 2004 
was the third event in the series. The Diagrams series attracts a large number of 
researchers from virtually all academic fields who are studying the nature of dia- 
grammatic representations, their use in human communication, and cognitive or 
computational mechanisms for processing diagrams. By combining several earlier 
workshop and symposium series that were held in the US and Europe - Rea- 
soning with Diagrammatic Representations (DR), US; Thinking with Diagrams 
(TWD), Europe; and Theory of Visual Languages (TVL), Europe - Diagrams 
has emerged as a major international conference on this topic. 

Diagrams is the only conference series that provides a united forum for all ar- 
eas that are concerned with the study of diagrams. We regularly attract delegates 
from fields as diverse as architecture, artificial intelligence, cartography, cogni- 
tive science, computer science, education, graphic design, geometry, history of 
science, human-computer interaction, linguistics, philosophy and logic, and psy- 
chology, plus many more. Because of this diversity, we take care in planning the 
programme to encourage broad interaction, through informal poster sessions, 
and through tutorials and invited presentations that provide introductions to 
some of these disciplines. 

In preparing for Diagrams 2004, we invited submission of tutorial proposals, 
full papers and posters. Submissions were received from 18 countries. They in- 
cluded 53 full-length papers, 33 poster submissions, and 5 proposals for tutorials 
on research approaches from specific disciplines. The selection process was rigor- 
ous, involving full peer review of all submissions. The acceptance rate was 34% 
for full papers and 73% for posters. A selection of the full paper submissions was 
also accepted for presentation as posters. 

The final programme included invited talks from Marcus Giaquinto and Gun- 
ther Kress, tutorials by Mary Hegarty, Jesse Norman, and Atsushi Shimojima, 
18 paper presentations, and 42 poster presentations, panels and workshops. The 
result was a programme that was balanced across a wide range of disciplines, 
covering both theory and application, and with worldwide international partici- 
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pation. It was our pleasure to sustain these virtues of the Diagrams series, and 
we enjoyed the lively interaction that is always a feature of the meeting itself. 

We wish to thank all members of the organizing and programme committees 
for the work that they put in toward the success of Diagrams 2004. We thank 
the University of Cambridge Computer Laboratory for supporting and hosting 
Diagrams 2004. We are grateful for financial support from the Cognitive Science 
Society, the Office of Naval Research, and the Engineering and Physical Sciences 
Research Council. 
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Diagrams in the Mind and in the World: Relations 
between Internal and External Visualizations 



Mary Hegarty 

University of California, Santa Barbara Santa Barbara, CA 93106 
hegarty@psych . ucsb . edu 



Abstract. Recent advances in computer technology and graphics have made it 
possible to produce powerful visualizations of scientific phenomena and more 
abstract information. There is currently much excitement about the potential of 
these computer visualizations, particularly in education and training in science, 
technology and medicine. This paper examines three possible relations that 
might exist between internal and external visualization. First, external 
visualizations might substitute for internal visualizations. Second their 
comprehension and use may depend on internal visualizations. Third, they 
might augment and be augmented by internal visualizations. By reviewing these 
possibilities, it is argued that the design of external visualizations should be 
informed by research on internal visualization skills, and that the development 
of technologies for external visualizations calls for more research on the nature 
of internal visualization abilities. 



Introduction 

Recent advances in computer technology and graphics have made it possible to 
produce powerful visualizations of scientific phenomena and more abstract 
information. There is currently much excitement about the potential of these 
computer visualizations, as indicated by the publication of several recent reports and 
books on scientific and information visualization (Nielson, Shriver & Rosenblum, 
1990; Card, Schneiderman & McKinley, 1999 Spence, 2001). These external 
representations are seen as having the potential to augment, enhance, or amplify 
cognition (Card et al, 1999; Norman, 1993) and educational theorists have stressed 
the need for exposing students to these powerful visualizations in science education. 

The goal of this paper is to examine the possible relations that might exist between 
external visualizations and internal visualizations. An external visualization is an 
artifact printed on paper or shown on a computer monitor that can be viewed by an 
individual. An internal visualization is a representation in the mind of an individual. 
For the purposes of this paper, I define a visualization as any display that represents 
information in a visual-spatial medium. Although computer visualizations are often 
dynamic and interactive, my definition also includes static displays such as graphs, 
diagrams and maps. Internal and external visualizations can represent physical 
phenomena that are spatial in nature, such as the development of a thunderstorm. 
They can also depict more abstract phenomena, such as the flow of information in a 
computer program or the organization of information on the world-wide web. 
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Although there is much excitement about the potential of external visualizations at 
present, the design of external visualizations is often seen as purely a computer 
science problem, with the goal of designing systems that can present sophisticated and 
realistic visualizations. Implicit in this view is the idea that external visualizations do 
not just augment but can perhaps replace internal visualizations. I argue that the 
potential for developing external visualizations does not replace the need to 
understand or develop internal visualization skills. On the contrary, I argue that the 
design of external visualizations should be informed by research on internal 
visualization skills, and that the technological developments call for more research on 
the nature of internal visualization abilities. 

External Visualizations. In recent history, technology has significantly improved our 
ability to create external visualizations. It is probably not surprising therefore that 
current research on the role of visualization in thinking focuses on external 
visualizations. External visualizations are seen as important ways of augmenting 
human cognition. This is evident in Card, Schneiderman & McKinley's (1999) 
definition of visualization as “The use of computer-supported interactive visual 
representations of data to amplify cognition (p 6)” or Norman's (1993) statement that 
“the real aids come from devising external aids that enhance cognitive abilities (p. 
43)”. Educational research emphasizes the need to expose children to powerful 
external visualizations of data. For example, in a recent article on the potential of 
scientific visualization for education, Gordon and Pea (1995) concluded: 

“The case has been made that SciV shows remarkable potential to help 
students learn science. As an extraordinary plastic medium, SciV affords the 
construction of provocative images that can resemble physical phenomena and 
serve as the basis for the construction, debate, and negotiation of meaning that 
stands at the heart of the process of education” (p 276) 

Cognitive scientists have made important contributions to recent research on 
external visualizations. One is a set of theoretical proposals for why external 
visualizations are effective. For example, in a seminal article entitled “Why a diagram 
is worth 10,000 words” Simon and Larkin (1987) argue that diagrams allow insights 
to be gained by powerful and automatic perceptual inference processes, and that they 
perceptually group information that must be integrated mentally (see also Koedinger 
& Anderson, 1990) Another is a specification of principles that make external 
visualizations more or less effective (e.g., Cheng, 2002; Kosslyn 1994; Shah, Mayer 
& Hegarty, 1999). Cognitive scientists have also developed models of how humans 
comprehend and make inferences from external visualizations (e.g.. Pinker, 1990; 
Carpenter & Shah, 1998; Hegarty, 1992). 

Internal Visualizations. There has also been an important tradition of research in 
cognitive science on internal visualization, that is, our ability to internally represent 
objects, events and more abstract phenomena as mental images, and our ability to 
infer new information by transforming these images. Studies of internal visualization, 
have examined people's ability to construct, inspect and transform mental images 
(Kosslyn, 1980). Shepard & Metzler's (1971) classic research on mental rotation 
showed that mental transformations are isomorphic to physical transformations of 
objects. More recent research, using both behavioral methods (Finke, 1989) and 
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cognitive neuroscience methods (Kosslyn, 1994) has demonstrated that imagery and 
perception involve some shared mechanisms. Our ability to construct and maintain 
internal visualizations has also been an important topic in the working memory 
literature (Baddeley 1986; Logie, 1995). In this literature, the internal visualization 
system is called the visual-spatial sketchpad, and has been shown to be important in 
such activities as visual imagery, playing chess, and playing video games (Logie, 
1995). Finally, the literature on human intelligence has identified spatial ability as 
one of the most important components of human intelligence. This research suggests 
that there may be several somewhat dissociable spatial abilities, but the most robust 
and well-documented spatial ability, known as spatial visualization ability, is defined 
as reflecting “processes of apprehending, encoding, and mentally manipulating spatial 
forms” (Carroll, 1993, p. 309). Examples of tasks used to test spatial visualization 
ability are shown in Figure 1. All of these tasks involve imagining the result of spatial 
transformations, such as rotating an object or folding a piece of paper. 

Psychological research on internal visualization has been more concerned with the 
nature of the imagery system than in its role in thinking and reasoning. Flowever, 
research in the history and philosophy of science indicates that internal visualizations 
have been quite powerful in scientific discoveries and inventions (Ferguson, 1977; 
Miller 1986; Nersessian, 1995). For example, Einstein claimed that he rarely thought 
in words, and had difficulty translating his images into equations and verbal 
communications (Miller, 1986). Similarly, ability to mentally simulate the behavior of 
machines has been reported to be central to design and invention (Ferguson, 1977; 
Shepard, 1978). Nikolai Tesla, Oliver Evans (inventor of the automatic flour mill) and 
Walter Chrysler (founder of the automobile company) are among the many famous 
engineers who claimed to mentally imagine their inventions in great detail before 
beginning to build them (Ferguson, 1977). Research on individual differences in 
spatial visualization also points to the importance of internal visualization in science, 
engineering and medicine. Spatial visualization ability, which can be seen as a 
measure of internal visualization, is correlated with success in mechanical 
occupations, mathematics, physics and medical professions (Flegarty & Waller, in 
press). 

Relations between Internal and External Representations. The purpose of this 
paper is to explore the relationship between internal and external visualizations and 
the implications of this relationship for education and training in scientific, 
engineering and medical professions. 1 consider three different relations that external 
visualizations might have to internal visualizations. One possibility is that external 
visualizations can substitute for internal visualizations. That is, a person can have the 
same insight, (or perhaps a better insight) by viewing or manipulating an external 
visualization of some phenomenon as he or she would have by internally visualizing 
the same phenomenon. If this is true, an external visualization can act as a 
“prosthetic”. It is not necessary for someone to have internal visualization skills, and 
education should focus on exposing people to many external visualizations rather than 
developing the ability to visualize internally. A second possibility is that use of 
external visualizations depends on the ability to internally visualize. In this case, 
gaining insight from an external visualization would depend on the same abilities and 
skills as internal visualization. Fostering internal visualization would be an essential 
goal of education and training. Finally, a third possibility is that external 
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visualizations augment internal visualizations, by providing information or insights 
that are additional to those that can be inferred from internal visualizations. In this 
case there is continuity between what can be internally visualized and what can be 
learned from an external visualization, and education should both foster internal 
visualization and expose students to external visualizations. 
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Fig. 1 . Examples of items from tests of Spatial Visualization ability In the first test (the Paper 
Folding Test), the test taker has to imagine that a piece of paper is folded and a hole is punched 
through it and must choose which of the figures on the right will result when the paper is then 
unfolded. In the second example (the Cube Comparisons test), the test taker has to decide 
whether the cube on the right could be a rotation of the cube on the left. In the third example 
(the Form Board test) the test taker must decide which of the shapes on the bottom would be 
needed to fill the rectangle on the top, assuming that the shapes can be rotated to fit 



These three possibilities are not mutually exclusive. Each of the possibilities might 
be true for different types of visualizations, different types of content, or different 
types of people. For example, viewing a data graph may act as a substitute for 
internally visualizing the graph, but understanding an animation of a thunderstorm 
may be possible only for those with high spatial visualization ability. Furthermore, 
external visualizations might both augment internal visualizations and depend on 
internal visualization ability. Flowever, for the purposes of exposition, I will consider 
them as three separate hypotheses. I will each consider each of the three situations in 
more detail and examine where some cognitive studies on visualization fit within this 
framework. I will draw on research concerning computer animations of mechanical 
systems, visualizations of anatomy used in medical education and practice, and 
visualizations of meteorological phenomena. Research on these topics has examined 
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how people comprehend and learn about phenomena that occur in three-dimensional 
space and change over time by viewing visualizations that are presented in the two 
dimensional media of the printed page and the computer monitor. 

Relation 1: External Visualization is a Substitute for Internal Visualization 

The first possibility is that viewing an external visualization of a phenomenon can be 
a substitute for internally visualizing the same phenomenon. If this is true, the 
availability of external visualizations relieves us of the necessity of internally 
visualizing and an external visualization can serve as a “prosthetic” for those who 
have difficulty with internal visualization, so there is no need to be concerned about 
the nature of internal visualizations or to develop internal visualization abilities 
among students. This is perhaps a straw-man theory, but it is implicit in any program 
of research or practice, which focuses exclusively on the development of external 
representations as a means of educating and training students without regard to their 
internal visualization abilities. 

Let us explore what it would mean for external visualizations to substitute for 
internal visualizations in this way. First, it would imply that if people have access to 
interactive external visualizations, they would manipulate these visualizations and 
therefore no longer need to carry out effortful internal visualization processes. 
However, recent research suggests that this is not what happens. Trafton and his 
colleagues (Bogacz & Trafton, 2002; Trafton, Tricket & Mintz, in press; Tricket & 
Trafton, 2002) have examined how expert scientists and meteorologists interact with 
external visualizations in problem solving situations, such as weather forecasting and 
data analysis. A striking result of these studies is that even in situations in which the 
experts can manipulate an external display to reveal new data or a new perspective on 
some phenomenon, they rely extensively on internal visualizations, and in some 
studies, they manipulate internal visualizations more often than they use the computer 
interface to manipulate the external display. It is clear therefore that external 
visualizations do not substitute for internal visualizations for these experts. 

It might be argued that external visualizations are more likely to substitute for 
internal visualizations in the case of novices, who have not yet developed the ability 
to internally visualize. For example, by viewing an external visualization of some 
scientific phenomenon, a novice might quickly gain some insight into that 
phenomenon. This view assumes that if an individual (Person 1) has an insight that is 
based on or at least accompanied by an internal visualization, and externalizes that 
visualization, a second individual (Person 2) can view that external visualization and 
have the same internal representation and insight as Person 1. In the extreme, this 
would be true, regardless of the abilities or prior knowledge of Person 2. For an 
external visualization to substitute for an internal visualization in this way, the 
following conditions would have to hold. First, the external representation of a 
phenomenon would have to be isomorphic to its internal representation. Second, 
merely viewing the external representation would have to be sufficient to recreate an 
internal copy of this representation. 

Are Internal Representations Isomorphic to External Representations? The idea 
that the internal representation of a phenomenon is isomorphic to its external 
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representation is related to “the resemblence fallacy” proposed by Scaife and Rogers 
(1996). Research on mental representations of motion in simple mechanical systems, 
demonstrates how difficult it would be to create an external visualization that is 
isomorphic to a person's internal representation of a mechanical system. When a 
machine (e.g. a pulley or gear system) is in motion, many of its components move at 
once. Therefore, most people would agree that a realistic computer visualization 
representing the motion of objects in a mechanical system would also show its 
components moving simultaneously. In contrast, when people attempt to internally 
visualize how a machine works, they infer the motion of components one by one, in 
order of a causal chain of events (Hegarty, 1992; Hegarty & Sims, 1994, Narayanan, 
Suwa & Motoda, 1994; 1995). That is, their mental representation is of a causal 
sequence of events rather than of a set of simultaneous events. In externally 
representing how a machine works, one therefore has to choose between showing the 
components moving sequentially (as the process is conceptualized by most people) or 
simultaneously (as they move in the real world). Furthermore, although inferring the 
motion of mechanical systems often depends on spatial visualization processes 
(Hegarty & Sims, 1994; Sims & Hegarty, 1997, Schwartz & Black, 1996) it can also 
be based more on verbal rules of mechanical inference, so that most current theories 
of people's mental models of machines allow for hybrid representations involving 
both visual-spatial representations and verbal rules (Schwartz & Hegarty, 1996; 
Narayanan, Suwa & Motoda, 1994). Again, it is not clear how to translate such a 
hybrid representation into an external visualization. 

A related problem is that people's internal mental models of spatial objects and 
events are not purely visual-spatial, but may depend on other senses. For example, 
surgeons often report that they “see” with their hands as much as with their eyes 
during surgical procedures, suggesting that their mental representations of anatomy 
are based on both haptic and visual modalities. As a result, many common errors in 
minimally invasive surgery, carried out using instillments that are inserted through 
small incisions in the skin, are attributed to loss of the haptic information that is 
available in open surgery. Use of pure visualizations, either in teaching anatomy or in 
training surgical procedures may therefore lead to an impoverished internal 
representation compared to the experience that it is intended to simulate. These 
examples indicate that it may not be possible to externally represent the internal 
representation of a complex phenomenon in the visual modality alone. 

Does Viewing of an External Visualization Lead to a Veridical Internal 
Representation? The existence of visual illusions is evidence that perception of 
visual stimuli does not necessarily lead to a veridical internal representation of these 
stimuli. For example, an object in free fall is seen as falling with a constant velocity, 
although it is in fact constantly accelerating, and an object moving at a constant 
velocity is not seen as constant (Runeson, 1974). Such misperceptions are not just true 
for perception of apparent motion, as shown by a rapid sequence of static images in 
an animation, but also for perception of real motion. For example, before stop-gap 
photography, nobody knew how a horse's legs move while it is galloping, because 
they move too fast to be perceived accurately. As a result, the legs of galloping horses 
have been portrayed incorrectly in art throughout history (Tversky, Morrison, & 
Betrancourt 2002). Therefore, viewing a motion event, either in the real world or in an 
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animation, does not necessarily lead to an accurate internal representation of the 
event. 

The internal representation that results from viewing an object, scene, or event can 
also be dependent on the knowledge of the viewer. For example, when a chess 
grandmaster views a chess board in the middle of a game, he sees a small number of 
visual chunks representing groups of chess pieces in attack and defence relations to 
each other (Chase & Simon, 1973). In contrast, a novice chess player viewing the 
same chess board sees a large number of individual chess pieces. Similarly, 
perception of visual representations such as graphs, maps and animations can be 
influenced by knowledge of both the phenomena depicted in those representations and 
the graphical conventions used to visualize those phenomena (e.g. the use of color to 
show temperature and isobars to show pressure on a weather map). For example, 
Lowe (1999) has found that novices interpretations of weather maps and animations 
are largely descriptions of surface features of the maps and do not contain the types of 
insights into meteorological phenomena that are evident in experts' interpretations of 
the same maps and animations. Therefore if the creator of a visualization and the 
viewer of the same visualization differ in knowledge, it is very unlikely that the 
viewer will end up with the same mental representation of the referent object or event 
as that of the creator of the visualization. 

In summary, there is no reason to assume that the internal representation of an 
object or event is isomorphic to the object or event itself or to its external 
visualization. Similarly, there is no reason to assume that viewing an object or event 
or its external visualization will lead to a veridical internal representation. This 
suggests that the development of external visualizations alone is not a solution to how 
to communicate about complex phenomena. This design must also take account of the 
nature of people's internal visualizations. 

Relation 2: External Visualization as Dependent on Internal Visualizations 

Rather than replacing internal visualizations, the use and comprehension of external 
visualizations might be dependent on internal visualization skills. In this case people 
with less spatial visualization skill or lower spatial ability would have poorer 
comprehension of external visualizations than those with more skill or more ability. A 
recent study on a classic visual illusion is a case in point. Isaak and Just (1995) 
studied ability to judge the trajectory of a point on a rolling ball, which involves both 
translation and rotation. In this situation, people are subject to an illusion called the 
curtate cycloid illusion, which can be explained by a model in which they temporarily 
fail to process the translation component of motion at a critical point in the rolling 
motion. Isaak and Just found that people with high spatial visualization ability were 
less subject to the illusion than people with low spatial visualization ability. They 
proposed that spatial working memory, necessary for generating internal 
visualizations, was also necessary to simultaneously process the rotation and 
translation components of motion in comprehension of the visual display. In complex 
visualizations in which several motions occur at once or the motion of several objects 
must be tracked, people with better internal visualization skills appear to have an 
advantage in comprehension. 




8 Mary Hegarty 



Learning from external visualizations may also depend on ability to visualize 
internally. Mayer and Sims (1994) examined the role of spatial ability in learning 
from animations that explain how a mechanical system (car brakes) and a biological 
system (the human respiratory system) work. They considered two alternative 
hypotheses. The first hypothesis was that viewing an animation would compensate for 
low spatial ability (i.e., the animation would act as a prosthetic for those with poor 
internal visualization skills, as discussed above) so that low spatial individuals would 
learn more from animations than they do from static diagrams and the differences in 
learning between high- and low spatial individuals would be smaller for animations 
than for static diagrams. The second hypothesis was that spatial ability would be a 
necessary prerequisite for learning from an animation. In this case high-spatial 
individuals would learn more from animations than low-spatial individuals and the 
difference between high- and low-spatial individuals would be greater in the case of 
animations than in the case of static diagrams. The results were consistent with the 
second hypothesis. That is people with high spatial ability learned more from the 
animation than people with low spatial ability, and the difference between learning of 
high- and low-spatial individuals was greater in the case of animations. 

In the field of medical education, spatial abilities have been found to be related to 
the ability to learn anatomical structure from visualizations that reveal the three- 
dimensional structure of the anatomy by showing rotating models. Garg, Norman, 
Spero & Maheshwari (1999) found that exposing students to non-interactive three- 
dimensional models of carpal bone configurations impaired spatial understanding of 
the anatomy for low-spatial students while it enhanced understanding for high-spatial 
students. This can be seen as another example of a situation in which internal 
visualization ability is a necessary prerequisite for learning from an external 
visualization. 

Finally, in some recent research in my laboratory, Cohen, Hegarty, Keehner and 
Montello (2003) examined students' ability to draw the cross-section of an object that 
results when the object is cut along a particular plane. Initial studies of this ability 
indicated that it is highly related to spatial ability. In a follow up study, we allowed 
students to interact with a three-dimensional computer visualization of the object as 
they drew the cross section. The students were allowed to interact with the computer 
visualization to rotate the object, so that they could see it from different perspectives. 
This study again replicated the high correlation between spatial visualization ability 
and ability to draw the cross sections. A more surprising result is that there was also a 
significant correlation between spatial ability and use of the visualization such that the 
low-spatial individuals rotated the visualization less often than the high-spatial 
individuals. When interviewed, some of the low-spatial individuals indicated that they 
did not understand how the visualization could help them. This study indicates that in 
some situations, internal visualization abilities might be a prerequisite for being able 
to use an external visualization effectively. 

Relation 3: External Visualizations as Augmentations of Internal Visualizations 

As reviewed above, recent studies of external cognition describe external 
visualizations as amplifying, enhancing or augmenting cognition (Card, McKinlay & 
Schneiderman, 1999; Norman, 1993). This view suggests that there is an interplay 
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between internal visualization processes and comprehension of external 
visualizations, or that insight or learning is based on a combination of internal 
visualizations and perception of external visualizations. Although there has been 
important research specifying how internal and external static representations interact 
in the performance of cognitive tasks (Zhang, 1996; Zhang & Norman, 1994) this 
work has not focused on visualizations or their roles in learning about spatial 
structures or dynamic processes. 

One possibility is that by externalizing a mental visualization, a person relieves 
him or herself of the need to keep the internal representation in working memory, and 
this frees up processing resources for making further inferences (cf. Zhang & 
Norman, 1994). For example, Flegarty & Steinhoff (1997) studied the ability of high- 
and low-spatial individuals to “mentally animate” or infer the motion of components 
in a mechanical system from a static diagram of the system. Previous studies had 
shown that low-spatial individuals were poorer at mental animation and we suggested 
that this was because they had less spatial working memory and therefore could not 
maintain a spatial representation of the machine while simultaneously imagining the 
motion of its components. Flegarty and Steinhoff examined the effects of allowing 
high- and low-spatial individuals to make notes on diagram of mechanical systems 
while they inferred the motion of components. They found that low-spatial 
individuals who made notes indicating the direction of motion of each component in 
the mechanical system were more successful at solving the mental animation 
problems, and in fact were as successful as high-spatial individuals. In this case, the 
external visualization was augmented by the problem solver as a result his or her 
internal visualization processes. The augmented external display, in turn, relieved the 
participants of the need to maintain an internal visualization of the machine in 
memory. 

In another recent experiment, Hegarty, Kriz & Cate (in press) examined the roles 
of mental animation and viewing an external animation in understanding how a 
relatively complex machine (a flushing cistern) works. One group saw a static 
diagram of the machine, another group saw diagrams of the machine in different 
phases of its operation and had to generate an explanation of how the machine worked 
(mental animation), a third viewed an external animation of the machine, and a fourth 
group both mentally animated and viewed an external animation. The results 
indicated that there were positive main effects on comprehension of both mental 
animation (generating an explanation from a series of static views) and viewing an 
external animation, and no statistical interaction between these two variables. This 
pattern of results suggests that the effects of internal visualization (in this case mental 
animation) and external visualization (viewing an external visualization) were 
complementary. As a result, those who first internally visualized and then viewed the 
external animation had better understanding than those who just mentally visualized 
or just viewed the external visualization. One possible interpretation of these results is 
the mental animation process induced individuals to articulate their intuitions about 
how the machine works, so that when they later viewed the external visualization, 
they could compare these intuitions to the actual physical process shown in the 
animation and pay particular attention to the parts of the process that they could not 
mentally animate. 
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In both of these examples, we see that there is an interplay between external and 
internal representations and processes. In the first example, the students augmented 
the external display to show the results of their internal visualizations. In the second 
case the external animation also augmented students' internal visualizations, such that 
they learned additional information from this animation that was not revealed by 
mental animation. A similar process of interaction between external and internal 
displays has been found when experts interact with data displays (Trafton et al. in 
press, Trickett & Trafton, 2002). These studies suggest that effective external 
visualizations are closely tied to internal visualizations and suggest that it is important 
to take account of internal visualizations in designing external visualizations. 

Implications 

In summary, the studies reviewed above suggest that rather than replacing the need to 
visualize internally, use of external visualizations is often accompanied by internal 
visualization activities, and effective use of these tools may even depend on a certain 
level of spatial visualization abilities. This suggests that educational studies need to be 
concerned with the development of student's internal visualization abilities, as well as 
being concerned with the development of the most effective visualizations. 

Of course, one potential method of developing internal visualization abilities might 
be to expose students to external visualizations. For example, one approach might be 
to show external visualizations that model the internal processes that students are 
required to imagine in developing an internal visualization (Olson & Bialystock, 
1983; Cohen et al, 2003). However, we need research on whether people can learn to 
internally visualize from external visualizations, and the conditions under which this 
learning takes place. Specifically, we need to know what types of interactions people 
need to have with an external visualization in order to leam from it. Given that some 
individuals may not appreciate how a visualization can be helpful to them (Cohen et 
al., 2003) we also need to understand what types of instruction to give students on 
how to interpret and use external visualizations. Importantly, the design of 
visualizations, and of instruction about how to use visualizations, should also take 
account of students internal visualization skills, as the research reviewed above 
suggests that the optimal form of instruction will be conditional on these. 

More generally we need to understand more precisely how external visualizations 
augment internal visualizations. I have suggested one possibility, which is that they 
relieve the problem solving of the need to maintain complex internal visualizations in 
working memory. This is similar to the idea of distributed representations proposed 
by Zhang and Norman (1994). However, there are many other ways in which external 
visualizations might augment internal visualizations. For example, it is likely that 
there are limits on the complexity of the spatial transformations that can be imagined 
within the limited capacity of spatial working memory. External visualizations might 
augment these internal visualizations by presenting somewhat more complex 
transformations externally. However, this raises the question of how complex an 
external visualization people can comprehend, and the research presented above 
suggests that this might be somewhat related to internal visualization skills. 

Finally in addition to studying how external visualizations augment internal 
visualizations, we also need to study how internal visualizations augment external 
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visualizations. When a person mentally animates a static image of a machine (Hegarty 
1992) or when a scientist or meteorologist mentally manipulates an external display 
of some data (Trafton et al, in press; Trickett & Trafton, 2002) he or she is 
augmenting the external static display with inner spatial transformation processes. 

In summary, a review of the literature on understanding, learning from and 
producing external visualizations suggests that there is a complex interplay between 
internal and external visualizations, and that the design of effective external 
visualizations will be based on an understanding of internal visualization abilities. 
Rather than replacing the need for internal visualization processes, the development 
of technology to produce powerful external visualizations challenges us to better 
understand the nature of internal visualization processes, how to foster their 
development, and how they interact with external visualizations. 
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3 Benefits 

The tutorial will give attendees: 

— A firm grasp of an important debate in the history of ideas. 

— A logical framework within which to assess questions of diagrammatic justi- 
fication. 

— Knowledge of some key philosophical arguments for and against the epistemic 
value of diagrams. 

— A deeper understanding of the nature of diagrammatic reasoning. 

4 Content 

The Issue. Can diagrams have epistemic value? Can reasoning with diagrams 
confer knowledge or justify belief? These fundamental questions have been little 
studied outside the area of logical diagrams, in which the formal syntax of an ade- 
quately specified given system rules out the possibility that correctly interpreted 
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diagrams can mislead the user. 1 In Euclid’s geometry, however the paradigm case 
of a body of exact knowledge until the mid-19th century precisely this possibil- 
ity seems to exist. And there is the further problem of generality: how reasoning 
with a single diagram can justify knowledge of a general mathematical claim. 

Traditional Responses. Accordingly, the dominant view among philosophers and 
logicians, following Bertrand Russell, has long been that the diagrams have no 
epistemic value but are “merely heuristic”; on an older view, which dates back to 
Kant, they do have epistemic value, but only via a dubious appeal to a postulated 
faculty of “intuition” . 

Challenging the Tradition. The tutorial challenges these claims, and the back- 
ground assumption that they exhaust the available alternatives. Against the 
“heuristic” view, it shows that diagrams can be of genuine epistemic value, us- 
ing Euclid’s geometry as a case study. Against the “intuitive” view, it shows 
that this epistemology need make no appeal to a faculty of intuition. Thus the 
traditional debate mistakenly ignores a third, crucial, possibility. 

Structure. The discussion breaks down into three parts. The first part sets up 
the problem and the logical space of alternative solutions; the second explores 
the candidate solutions themselves; the third selects, elaborates and defends the 
preferred solution. To make the analysis and subsequent discussion as specific 
as possible, the discussion is focused on a single argument: Prop. 1.32 of the 
Elements, to the effect that all triangles have internal angles that sum to two 
right angles: the so-called “angle sum” property. 

— The Problem and the Solution Space. The tutorial introduces a logically 
exhaustive Framework of Alternatives, covering different theories that can be 
advanced to account for the apparent justification offered by this reasoning. 

— Exploring Candidate Solutions. The tutorial appraises four candidate the- 
ories that might be advanced in each of the categories already identified. 
These theories can be plausibly attributed to an interpretation of Plato by 
W.D. Ross, to J.S. Mill, to Leibniz, and to Kant. Each theory holds that 
Euclid’s argument confers justification, but they differ as to how it does so. 

— Developing the Preferred Solution. Three of the candidate theories can be 
shown to fail. The tutorial then defends a version of the fourth view: it high- 
lights some of its distinctive features and commitments; it shows how it meets 
three main lines of criticism, and it meets further logical and epistemological 
tests. 

1 Overall, see e.g. the collections Glasgow et al. [1], and Blackwell [2]. Greaves [3] 
gives a broad philosophical survey of diagrams in geometry and logic, but does not 
devote detailed consideration to the epistemology of reasoning with diagrams as 
such. For diagrams in computing/AI, see e.g., Sowa [4] and Jamnik [5]; in logic, 
see e.g. the works of Barwise and his collaborators Etchemendy and Allwein, and 
Barwise’s students Shin, Shimojima and Hammer; and, for a case study comparing 
inference using diagrams and sentences in propositional logic, see Norman [6]. 
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Significance. Questions of epistemic value are fundamental to current philosoph- 
ical and logical research on diagrams. Historically, virtually every major philoso- 
pher of the period 1600-1850 discusses Euclid’s geometry including Descartes, 
Gassendi, Leibniz, Hobbes, Hume, Locke, Berkeley, and Kant and most of them 
discuss Prop. 1.32 in particular. Thus the tutorial does not merely address an im- 
portant philosophical problem, it also situates and resolves a well-known debate 
in the history of ideas. 

5 Audience 

The argument of Prop. 1.32 is well-known and easy to understand, making the 
case study approach used here accessible to researchers in all disciplines. The 
tutorial, which is based on and extends previously published work, uses no special 
logical formalism or technical apparatus. It should be of interest to a wide range 
of researchers into diagrams, including in such areas as cognitive science, artificial 
intelligence, education and design. 

6 Instructor Background 

Jesse Norman is currently Departmental Fellow, Department of Philosophy, Uni- 
versity College London. He has an MA from Merton College, Oxford University 
in Classics, and an MPhil and PhD in Philosophy from University College Lon- 
don, and has been the recipient of numerous academic awards and prizes. 
Relevant publications include: 

“Peirce Provability and the Alpha Graphs”, Transactions of the C.S. 
Peirce Society , Winter 2003. 

Visual Thinking in Euclid’s Geometry: An Epistemology of Diagrams 
(University College London: PhD Thesis). 

“Iconicity and ‘Direct Interpretation’ ”, Multidisciplinary Studies of Vi- 
sual Representations and Interpretations (Elsevier Science 2002). 

“Differentiating Diagrams: A New Approach”, in Anderson, M., Cheng, 

P., and Haarslev, V. (eds.), Theory and Application of Diagrams (Berlin, 
Springer 2000). 

The Achievement of Michael Oakeshott (Duckworth, 1992). 

7 Requirements List 

The tutorial will require an OHP. There are no other requirements. 
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1 Tutorial Topics 

Thanks to recent as well as age-old theoretical studies, we now find at least four 
important concepts that seem to capture the crucial functional traits of varieties 
of graphical representations. They are, roughly, the following concepts: 

1. Free ride properties: expressing a certain set of information in the system 
always results in the expression of another, consequential piece of informa- 
tion. The concept has been suggested or proposed, under various names, as 
an explanation of certain automaticity of inference conducted with the help 
of graphical systems (Lindsay [1]; Sloman [2]; Barwise and Etchemendy [3]; 
Larkin and Simon [4]; Shimojima [5]). 

2. Auto-consistency: incapacity of the system to express a certain range of in- 
consistent sets of information. The concept has been suggested as an expla- 
nation of the ease of consistency inferences done with the help of graphical 
systems (Barwise and Etchemendy [6]; Barwise and Etchemendy [7]; Sten- 
ning and Inder [8]; Gelernter [9]; Lindsay [1]; Shimojima [10]). 

3. Specificity: incapacity of the system to express certain sets of information 
without choosing to express another, non-consequential piece of informa- 
tion. The concept has been suggested or proposed as an explanation of the 
difficulty of expressing “abstract” information in certain graphical systems. 
(Berkeley [11]; Dennett [12]; Pylyshyn [13]; Sloman [2]; Stenning and Ober- 
lander [14]; Shimojima [5]). 

4. Meaning derivation properties: capacity to express semantic contents not 
defined in the basic semantic conventions, but only derivable from them. 
The concept has been offered as an explanation of the richness of semantic 
contents of graphics in certain systems. (Kosslyn [15]; Shimojima [16]). 

The purpose of this tutorial is to give an accurate but accessible summary of 
these “fruits” of the previous research into graphical representations, formulat- 
ing their exact contents, exposing their explanatory ranges, and exploring their 
possible modifications or extensions. 

This tutorial is divided into three stages. In the first stage, I will first offer 
a small running example illustrating all the four concepts in a simple, but ac- 
curate manner. To ensure the accuracy of the illustration and to facilitate more 
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detailed learning on the part of the audience, I will always refer to the original 
works that have suggested or proposed the concept in question, sometimes citing 
their own examples. 

After ensuring the intuitive grasp of each concept with these examples, I will 
offer a more accurate reconstruction of each concept. I need make the recon- 
structions precise enough to determine the application ranges of the concepts in 
question, but the logical apparatus used for reconstructions will be kept mini- 
mal to ensure the accessibility to those participants with little acquaintance with 
logic and related mathematics. 

In the third stage, we will explore much further examples of graphical sys- 
tems to see how far these concepts are applicable as explanations, what func- 
tional traits of what graphical representations they fail to capture, and what 
refinements or modifications would be necessary to extend their explanatory 
ranges. In particular, we will discuss various theoretical works in diagram- 
matic reasoning, and investigate the relationship between our four concepts 
with the ideas offered in those works. Depending on time available, we hope 
to cover such concepts as: “locality” (Larkin and Simon [4]), “analog and digi- 
tal representation” (Dretske [17]), “perceptual inference” (Larkin and Simon [4]; 
Narayanan et al. [18]), “mental animation” (Hegarty [19]), “law-encoding di- 
agram” (Cheng [20]), and “spatial transformation” or “hypothetical drawing” 
(Schwartz [21]; Trafton and Trickett [22]; Shimojima and Fukaya [23]). Examples 
of graphical systems will be also taken from collections of real-world graphics 
such as Bertin [24], Tufte [25, 26, 27] and Wildbur and Burke [28]. 

2 Benefit 

The tutorial will serve as an interim report of the theoretical research on the 
functional traits of graphical representations conducted so far, as well as an 
impetus of further development. Also, it will end up with a fairly comprehensive 
survey of the literature in the this theoretical area. Such a survey/tutorial is 
especially important since the works in this area are typically scattered over 
diverse fields such as AI, cognitive psychology, philosophy, logic, and information 
design, conducted in different methods, vocabularies, and degrees of technicality. 
This prohibits an easy overview of various results, proposals, and suggestions 
offered in the area. The audience will obtain an accessible summary of these 
results and ideas, described in a single, systematic conceptual set. 

3 Audience 

Any researcher interested in theoretical analyses of the inferential and expressive 
capacities of graphical representations should be interested in the tutorial, no 
matter what field she or he may ne in, computer science, psychology, philosophy, 
logic, or AI. Practitioners of information design will also find the summery of 
theoretical results useful. The exploration of the literature of information design 
planned for the tutorial will have direct connections with the more practical side 
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of graphics research. The tutorial will be so designed not to require any special 
background knowledge or mathematical maturity on the part of the audience, 
except for the willingness to handle a certain level of abstract ideas. 

4 Instructor Background 

The instructor has been associate professor of the School of Knowledge Science 
for 5 years. He teaches graduate level courses in logic and cognitive science, 
and supervises master- and doctorate-level research in related fields. He is also 
a visiting researcher to ATR Media Information Science Laboratories. His edu- 
cational background is in philosophy and mathematical logic, and his research 
is centered around the efficacy of different types of representations in human 
problem-solving and communication. He publishes mainly in the fields of di- 
agrammatic reasoning and graphics communication, covering both logical and 
empirical approaches to the issues. 
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1 Introduction 

Modern logic has been provided since its very inception in 1879 with a diagrammatic 
outlook: Frege's two-dimensional symbolism in his Begriffsschrift. This was supposed 
to surpass specific limitations both of natural and of other constructed languages. 
However, it did not receive the attention paid to Frege's purely logical innovations. 
This fact was partly due to the common opinion, informed by logicians like Venn, 
Schroder and Peano, and heavily influenced by Russell's overstatement that the 
symbolism was ‘unfortunately so cumbrous as to be very difficult to employ in 
practice' [1]. This was rather ironic, I believe, as Frege devised it exactly for the 
purpose of assisting our inferential practice: ‘its chief purpose should be to test in the 
most reliable manner the validity of a chain of reasoning' [2]. The main thrust of my 
paper is to show that a particular point raised by ScJiroder - that Frege's conceptual 
notation fails to be modelled on the formula language of arithmetic - is based on a 
misunderstanding. I will describe then what it seems to me the most advantageous 
aspect of Frege's diagrams, and give a serious reason for their eventual cast-off. But 
first let's look at some of them. 

2 The Goal: Rigor and Perspicuousness 

Frege remarked in the preface to Begriffsschrift that the conceptual notation was 
intended to boost a large philosophical project, as it was supposed to be applied not 
only to arithmetic, but also to geometry, topology, pure kinematics, mechanics, 
physics, ‘wherever a special value must be placed on the validity of proofs', and to 
become a useful tool even in philosophy. The main constraints on this application 
were: (i) to keep the chains of deductive reasoning free of gaps ‘so that nothing 
intuitive could squeeze in unnoticed', (ii) to reject all proper names without reference, 
and (iii) to expel any psychological ideas. 

In [4], however, Frege offers some surprisingly psychological justifications to 
warrant the use of a two-dimensional symbolism: ‘The spatial relations of written 
symbols on a two-dimensional writing surface can be employed in far more diverse 
ways to express inner relationships than the mere following and preceding in one- 



* Frege’s diagrams have been only recently noted as an endeavor to logically reason with 
diagrams (e.g., in [3]), but no analysis has yet been offered. The present paper is motivated 
by this state of affairs. I am very grateful to Patricia Blanchette for her comments and 
suggestions. 
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dimensional time, and this facilitates the apprehension of that to which we wish to 
direct our attention. In fact, simple sequential ordering in no way corresponds to the 
diversity of logical relations through which thoughts are interconnected.' 

Frege firstly rejects audible symbols, which in spite of their more intimate relation 
with the ‘mind's life' cannot have upon ideas the effect that is required of a logical 
notation. Secondly he criticizes written symbols employed by other logicians 
(especially the algebraists) for lacking conceptual content: ‘any attempt to replace the 
single letters with expressions of contents [...] would demonstrate with the resulting 
imperspicuity, clumsiness - even ambiguity - of the formulas how little suited this 
kind of symbolism is for the construction of a true concept-notation' [4] 1 . There are 
two issues involved here: rigor and perspicuousness. A language is not rigorous when 
it fails to meet one or more requirements mentioned above, (i) - (iii). A language is 
perspicuous if in capturing the relational structure of thoughts it facilitates our 
apprehension of this structure. 

Today, it is commonly considered that the two-dimensional symbolism in 
Begriffsschrift cannot render the validity of inferences as perspicuous as modern 
notation is capable of doing (see, e.g., [5]). Since modern notation has developed 
from alternative logical notations, one could say that this warrants at the outset 
(though only a posteriori) the rejection of Frege's diagrams. I wish to argue now that, 
apart from this justification - conspicuously unavailable to Frege's contemporaries - 
another important criticism, due to Schroder, fails to go through 2 . 



Fig. 1 . (1) says: If B, then A. (2) introduces the quantification over variables. (3) introduces the 
quantification over functions, and says: If x has a property F, if F is hereditary in the f- 
sequence (3'), and if y follows x in the /-sequence (3”), theny has the property F 



1 This answers another criticism raised by Schroder, which is not addressed here, i.e., that 
Frege’s notation is in fact much worse than Boole’s. But the use, in the latter, of “+’ both as 
a logical and as an arithmetical sign is certainly ambiguous, as Frege contends. 

For a more extensive analysis which, however, makes a different point, see [6], 



A 

B 




m 



2 



24 Iulian D. Toader 



3 The Problem: Modelling - Conceptual vs. Notational 

Frege compared the advantages of his notation with the superiority of microscopes 
for scientific purposes over the capacities of the naked eye [2]. In his review of the 
Begriffsschrift , Schroder mentions the ‘monstrous waste of space' as a last (rather 
typographical) reason for rejecting the concept-notation [7]. But this surely sounds 
strange: it is as if someone would reproach Galileo the ‘terrible' waste of material 
used for his telescope 3 . A more serious motive behind Schroder's harsh objection is 
Frege's claim that his notation is modelled on the language of arithmetic. Only 
skimming over a few pages in Begriffsschrift is enough for someone to become 
suspicious of Frege's contention. You don't find such things in arithmetic ! Of course, 
Frege mentions the fact that arithmetical equations and inequalities are written one 
under another as they logically follow from one another, and so, in a two-dimensional 
manner. The conceptual notation is then modelled on this two-dimensional 
arithmetical structure. However, in the preface to Begriffsschrift he says: ‘The 
modelling upon the formula language of arithmetic to which I have alluded in the title 
refers more to the fundamental ideas than to the detailed structure' [2]. One of the 
fundamental ideas was to reduce the arithmetical concept of ordering-in-a-sequence 
to the concept of logical ordering. It was this, more than the two-dimensional way of 
arranging arithmetical equations, that made Frege speak of his conceptual script as 
modelled on the language of arithmetic. The modelling is a conceptual, not a 
notational, issue, and one cannot use one to criticize the other. Schroder missed the 
point, and that's why he considered the contention about the modelling the very point 
in which Frege's book corresponds least to its program [7]. 

4 Intuition Squeezed In 

Let's now look closer at one of Frege's diagrams, (2) in the Fig. above. The translation 
in our modern notation, Va(/?(a) — > g(a)) — > (Va(g(a) — >/( a)) — > (h(x) — >/x))), gets 
pretty complicated if we take it as symbolizing an inference in functional analysis, 
and plug in some analytic representations for our functions / g, and h, but admittedly 
there are further means to simplify it. However, an advantage of Frege's diagram is 
that we can more efficiently visualize and grasp the logical structure of the inference. 
I claim that this is due to an appeal to our intuition. It is like the use of a map for 
orientation (in the ‘logical space'), against some verbal (i.e., one -dimensional) 
directions. It is obviously more difficult to find the right way to Russell Square by 
following the indications of a British fellow, than by simply looking on a tubemap! In 
Frege's case, I contend, we see (perceive) the diagram, and therefore are inclined to 
see further (intuit) through the diagram, into the objective domain of concepts (which 
is of course not to say that this is how we primarily get access to this domain). 



3 Strange are also more recent remarks that Frege’s symbolism is ‘too original, and contrary 
to the age-old habits of mankind, to be acceptable’ (Bochenski, in [8]), as if the age-old 
habits of mankind were not to draw pictures, but abstract symbols. 
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Does this appeal to intuition mar the rigor of Frege's concept-notation? No, I'd say, 
because this intuition is not asked here to warrant either entailment or truth 4 , and so it 
does not go against point (i) above, but it simply is the way we use the symbolism in 
order to ‘test' the logical structure that it purports to represent. 

5 Trade-Off, so Cast-Off? 

Now, even if my claim is accepted, one can raise another question: can we contend 
general applicability for Frege's diagrams? For example, are they rich enough to 
cover all possible relations in the logical space of thoughts? Unfortunately, as it is the 
case with most graphical representations, any attempt to enrich them impairs their 
perspicuousness 5 . Frege's introduction of Gothic letters to label logical quantifiers 
(not to mention the second-order logic stuff in the third chapter of B e griffs schrift, 
e.g., his graphical expressions for ‘following in a /sequence') did exactly this, and so 
ruined to a degree even the raison d'etre of his symbolism - to assist inferential 
practice. But, as we have seen, it was not this the ground for its premature rejection, 
but rather misunderstanding and, as Russell avowed more than half a century later 6 , 
simply the idleness as to learning Frege's symbolism. 
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Abstract. A fundamental research activity in the field of Information 
Systems involves the proper design of data based systems. An important 
stage in construction is to determine the relevant, meaningful informa- 
tion structures in a domain, and to document these in an accurate and 
unambiguous way. Diagrammatic modeling notations have evolved as 
tools to facilitate this process. However an appropriate formal semantics 
to clarify the interpretation of these notations is difficult to define. This 
can result in models that are subjective and difficult to interpret by exter- 
nal parties. Recently, philosophical ontologies that provide a taxonomy 
of elements in the world have been proposed as a foundation to ground 
the symbols in diagrams. We argue that models represent a designer's 
psychological perception of the world rather than some abstract descrip- 
tion of that world. An ontology of these perceptions is therefore more 
relevant for the design of diagrammatic notations used in documenting 
and unambiguously communicating the analysis of a domain. We present 
an ontology of mental concepts from cognitive science, and find support 
for a prediction concerning ternary relations. Importantly, an influential 
thesis based on a philosophy of ’’real world” ontology makes the opposite 
prediction. We show that properties of the mind, rather than the world, 
should guide diagramming convention. 



1 Introduction 

What is the right formal foundation for a notational system that captures proper- 
ties of systems faithfully, and allows for clear communication between interested 
parties? This paper argues that a notational system based on the properties of 
our cognitive perceptual system will facilitate the construction of faithful and 
clear models. In support, we describe an experiment that compares a notation 
predicted by philosophical ontology against one based on psychological theory. 

1.1 The Ontology of the Mind 

Jackendoff ([1983]) proposes a theory of conceptual structure based largely on 
evidence from syntax - semantics mappings in natural language. Jackendoff stip- 
ulates that conceptual structures can be described by a small number of major 
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[Entity] 



Event/Thing/Place/. . . 

Token/Type 

F(<Entity-l, <Entity-2, <Entity-3»>) 



Fig. 1 . A general structure for Conceptual Constituents 



conceptual categories. These include [Event], [Thing], [Place], [Path]. Lexical 
items serve as functions which map their arguments into one of the ontological 
categories. For instance in the sentence John is tall , the verb be is a state- function 
that maps its arguments (found in the subject and predicate adjective position) 
into a [State]. Functions can take zero or more argument places. Figure 1. shows 
a general specification for conceptual structures. Note the recursive nature of 
the definition. 

1.2 Complements and Adjuncts 

By hypothesis, there is a difference in the conceptual representation of modifiers 
known as Complements and Adjuncts , originating in the fact that the former is 
syntactically more closely linked to the verb than is the latter. The difference 
is illustrated below, where example 1 shows the tight restrictions placed on the 
complement ( grocery stores ), but not on the adjunct (in Washington ) in example 
2. (There are other syntactic tests, not shown for space limitation). 

1. Sarah robs/*eats/*marries/*sleeps grocery stores in New York, (comp.-adj.) 

2. Adam robs/eats/marries/sleeps in Washington in June, (adj.-adj.) 

If this syntactic difference is reflected in Conceptual Structure, then it should be 
reflected in perceived differences of the relationships between the entities that 
are involved. These should then affect the way the scenarios are modeled. 

2 Experiment 

According to Wand, Storey and Weber ([1999]), there is no systematic distinc- 
tion in n-ary relationships in their philosophical ontology. They should all be 
modeled alike. Our theory predicts a difference between adj.-adj. and comp.-adj. 
sentences. We gave subjects a choice of a “standard” notation for n-ary relations, 
and a novel representation that was designed to capture perceived conceptual 
relations more accurately. Figure 2 shows the alternatives for sentence 1 above. 
Subjects were shown both diagrams for all sentences, and asked to chose which 
they preferred in the way it described the scenario. 

2.1 Results 

The results were straightforward. Figure 3 shows the mean number of preferences 
for the conditions. A chi-square test for independence revealed that the sentence 
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Fig. 2. A standard n-ary relation and the new notation 




Fig. 3. Total number of preferences for each sentence type 



type and notation were not independent in determining preference. There was 
a disproportionate preference for the new notation for comp.-adj. sentences: Chi- 
square = 10.23, p < 0.01. 

3 Conclusion 

Subjects tended to prefer the new notation in the complement-adjunct condition. 
Our explanation is that this is due to a difference in the conceptual representation 
of the situations described. In the absence of an alternative hypothesis that can 
explain the pattern of results, we conclude that in this experiment psychology, 
not philosophy is the proper guide to modeling. 
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Abstract. When formalising diagrammatic systems, it is quite common 
to situate diagrams in the real plane, R 2 . However this is not necessarily 
sound unless the link between formal and physical diagrams is examined. 
We explore some issues relating to this, and potential mistakes that can 
arise. This demonstrates that the effects of drawing resolution and the 
limits of perception can change the meaning of a diagram in surprising 
ways. These effects should therefore be taken into account when giving 
formalisations based on R 2 . 



1 Introduction 

When formalising diagrammatic systems, it is quite common to situate diagrams 
in the real plane , R 2 .. Curves and points in the diagram are associated with 
curves and points in the real plane. Results from real analysis - most commonly 
the Jordan Curve Theorem 1 - can then be used to prove various properties of 
the representation system. 

However this does not guarantee these properties unless the link between 
diagrams ‘drawn’ in R 2 and actual physical diagrams is examined. Familiarity 
makes it easy to forget that R 2 is a technical mathematical construction, and 
not the same as a physical plane. Caution is suggested by the fact that the 
Jordan Curve Theorem does not hold for Q 2 - which is a better approximation 
to R 2 than any physical drawing surface. We must take into account the limited 
precision of drawing tools, and the limits to which people using the diagram can 
accurately identify the objects drawn. In R 2 , it is possible to draw infinitely thin 
curves and distinguish between arbitrarily close points. This is, of course, not 
possible for any physical surface on which a diagram might be drawn. Another 
discrepancy is that R 2 is not bounded. 

To analyse the possible effects of these discrepancies, let us suppose that 
diagrams are produced by a drawing function that converts diagrams in R 2 into 
physical diagrams (either on paper or a computer screen). We may plausibly 
assume that given a diagram consisting of curves and points in R 2 , its physical 
drawing is a ‘blurred’ version of the original (where points have an area, lines have 
a width, and the drawing process can introduce some small errors). Measurement 
errors will also occur when reading the diagram, adding another level of blurring. 

At least two problems can occur in drawing objects from R 2 : 

1 “All non-intersecting closed curves in R 2 are homeomorphic to the unit circle” - and 
hence have a well-defined inside. 
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Fig. 1 . Example of a mistake arising from the appearance of equality. Spotting the 
error behind this paradoxical diagram is left as a puzzle for the reader 




Fig. 2. Example of how using inside to represent £ can produce representation errors: 
the left hand diagrams shows a £ B, which turns out to be false on magnification 



1. False equality statements can be generated, and in several different ways: If 
two points are close but not equal, two lines almost but not quite parallel, 
etc. these distinctions will be lost in the physical diagram. Figure 1 is a classic 
example 2 of this, which has implications for proofs such as the diagrammatic 
proof of Pythagoras’ Theorem. 

2. Many diagram systems use loops to represent sets (with inside used for 
C / £). Clearly, drawing at too low a resolution can obscure such relations, 
but that only results in lost information. More worrying is the possibility that 
false relations might appear as ‘artifacts’ of the drawing process. Figure 2 
shows how this can happen. 

2 Original source unknown. 
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2 Solutions 

Note that using diagram viewers with a ‘zoom’ ability does not solve these 
problems, since the user cannot know what level of magnification is required 
to reveal any errors. However the problems raised here can be dealt with in 
a rigorous fashion, and in several ways, including: 

— We can restrict what diagrams are allowed to say (i.e. what information can 
legitimately be read from a diagram). 

— We can restrict ourselves to diagrams where problems cannot arise. This can 
be done by identifying classes of diagrams which are immune to problems. 
For example, it can be proved - subject to very plausible assumptions - that 
the ‘closing eye’ structure shown in figure 2 is the only way in which errors 
involving c / G can occur (see [1]), and that restricting diagrams to using 
convex curves prevents this. 

— Where diagrams are computer-generated, the computer could detect that 
such errors have occurred in drawing the diagram (by analysing the bitmap 
produced or otherwise), and warn the user. This is the approach we have 
taken in our Dr. Doodle system for analysis theorem proving [2]. 

3 Conclusion 

We have shown that some caution is necessary when applying results from real 
analysis to diagrams. The issues raised here do not threaten diagrammatic rea- 
soning though: these problems are unlikely to occur in ‘normal use’ of most sys- 
tems, and can be seen as merely technical difficulties in formalising diagrammatic 
reasoning. Moreover, where diagrams are computer generated (which is surely 
the future of diagrammatic reasoning), such drawing errors can be automatically 
detected. We have, though, only examined drawing errors here. Ultimately, we 
would also like a theory of diagram reading errors, which would also cover ef- 
fects such as optical illusions. This is a much harder requirement, and any such 
theory must be based in a cognitive science understanding of how people process 
diagrammatic representations. 

We would like to acknowledge Corin Gurr’s help in this work. 



References 

[1] D.Winterstein “On Differences Between the Real and Physical Plane: 

Additional Proofs” Informatics Report Series, available online at 
http : //homepages . inf . ed. ac .uk/ s9902178/physicalDiagrams .pdf, Edin- 

burgh University, 2003. 31 

[2] D.Winterstein, A. Bundy, C.Gurr & M. Jamnik “Using Animation in Diagrammatic 
Theorem Proving” in Diagrammatic Representation and Inference, Springer- 
Verlag, 2002. 31 



Query Graphs with Cuts: 
Mathematical Foundations 



Frithjof Dau 

Technische Universitat Darmstadt, Fachbereich Mathematik 
Schlofigartenstr. 7, D-64289 Darmstadt 
dauOmathematik . tu-darmstadt . de 



Abstract. Query graphs with cuts are inspired by Sowa’s conceptual 
graphs, which are in turn based on Peirce’s existential graphs. In my 
thesis ‘The Logic System of Concept Graphs with Negations’, conceptual 
graphs are elaborated mathematically, and the cuts of existential graphs 
are added to them. This yields the system of concept graphs with cuts. 
These graphs correspond to the closed formulas of first order predicate 
logic. Particularly, concept graphs are propositions which are evaluated 
to truth-values. In this paper, concept graphs are extended to so-called 
query graphs, which are evaluated to relations instead. As the truth- 
values TRUE and FALSE can be understood as the two 0-ary relations, 
query graphs extend the expressiveness of concept graphs. 

Query graphs can be used to elaborate the logic of relations. In this sense, 
they bridge the gap between concept graphs and the Peircean Algebraic 
Logic, as it is described in Burch’s book ’A Peircean Reduction Thesis’. 
But in this paper, we focus on deduction procedures on query graphs, 
instead of operations on relations, which is the focus in PAL. Particularly, 
it is investigated how the adequate calculus of concept graphs can be 
transferred to query graphs. 



1 Introduction and Overview 

At the dawn of modern logic, two important diagrammatic systems for mathe- 
matical logic have been developed. One of them is Frege’s Begriffsschrift. The 
ideas behind the Begriffsschrift had an influence on mathematics which can 
hardly be underestimated, but the system itself had never been used in practice. 
The other diagrammatic system are Peirce’s existential graphs, which are unfor- 
tunately not known by many mathematicians. Nonetheless, a lot of research has 
been done on existential graphs, and they have influenced other diagrammatic 
systems as well. Among these, Sowa’s system of conceptual graphs, which are 
based on Peirce’s existential graphs and the semantic networks of artificial intelli- 
gence. is the most important. Their purpose is ‘to express meaning in a form that 
is logically precise, humanly readable, and computationally tractable’ (see [23]). 
In fact, conceptual graphs yield a powerful diagrammatic system with a higher 
expressiveness than existential graphs. But a closer observation shows that their 
definitions lack (mathematical) preciseness, which leads to several ambiguities, 
gaps and flaws (see [3]). 
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cat — [ -on — mat j cat: Yoyo ] 

Fig. 1 . An existential graph, a similar concept graph and a second concept graph 



— on rnat: * 



CATHOLIC!*]- AadoreH WOMAN: * j 



In order to fix these gaps and flaws, a mathematical elaboration of conceptual 
graphs is appropriate. Wille, who is like Sowa strongly influenced by the philoso- 
phy of Peirce, introduced in [28] an approach for such an elaboration, combining 
Sowa’s graphs and his theory of Formal Concept Analysis. The resulting graphs 
are called concept graphs (and they are a crucial part of Wille’s Contextual 
Logic, see [27, 29]). Until today, several systems of concept graphs with different 
kinds of negations, quantifiers etc. have been developed (an overview of these 
systems can be found in [6]). The system which will be used in this paper are 
the concept graphs with cuts (CGwCs), which have the expressiveness of first 
order predicate logic and are studied in detail by the author in [3]. 

Let us consider an example for existential graphs and concept graphs with 
cuts: 

The leftmost graph is an existential graph with the meaning ‘there is a cat 
which is not on any mat’. It is composed of predicates of different arities (cat, 
on, mat), so-called lines of identity which stand for objects and which are drawn 
bold, and finally of closed curves, so-called cuts , which are used to negate the 
enclosed subgraph. 

The graph in the middle is a concept graph with cuts. Instead of lines of iden- 
tity, concept boxes are used. These boxes contain a concept name and a referent. 
The star V is a special referent called generic marker. It can be understood as 
an object which is not further specified (similar to a variable in first order logic 
which is existentially quantified, or similar to a wildcard in computer systems). 
Besides the generic marker, object names are allowed as referents as well. The 
ovals between concept boxes represent relations between the referents of the con- 
cept boxes. The cuts of existential graphs appear in concept graphs with cuts as 
well (as the name says). But now, as they should not be confused with relation 
ovals, they are drawn bold. The meaning of this graph is ’’Yoyo is a cat and it 
is not true that there is a mat such that Yoyo is on this mat”, or ‘the cat Yoyo 
is not on any mat’ for short. 

On the right, we have a slightly more complex concept graph with cuts. As 
in existential graphs, it is allowed to iterate or nest the cuts (but cuts may not 
overlap). The meaning of this graph is ”it is not true that there is a catholic, but 
there is no woman this catholic adores” , or ” every catholic adores some woman” 
for short. 1 

Existential graphs and concept graphs are a diagrammatic form of proposi- 
tions. 2 It is well known that Peirce developed a logic of relations as well, and the 

1 This example is adopted from Peirce. 

2 More precisely: Of judgements , which are asserted propositions. But this distinction 
shall not be discussed here. 
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female 



male 





■married_with- 



■ mother_o 1 



■father_of— 



Fig. 2. A relation graph and the corresponding query graph 



graphical notation of existential graphs can be used for describing relations as 
well. Burch elaborated in his book ‘A Peircean Reduction Thesis’ ([!]) Peirce’s 
algebra of relations, the so-called Peircean Algebraic Logic (PAL). But, although 
the development of PAL is driven by the diagrammatic representation of rela- 
tions, Burch developed a linear notion for PAL and explains not until the last 
chapter of his book how this linear notion is related to its diagrammatic rep- 
resentation. For the framework of Contextual Logic and inspired by the work 
of Burch, Pollandt and Wille invented and investigated the so-called relation 
graphs which represent relations (see [15, 16, 30]). The left graph of Fig. 2 is a 
relation graph describing the relation is_stepmother_of . 

The free (or, in other words, unsaturated) valences of the relation correspond 
to so-called pending edges of the relation graphs, which are drawn as labelled 
lines of identity (see [16]). For concept graph with cuts, a small syntactical 
extension allows us to represent free valences of a relation: In addition to object 
names and the generic marker, numbered question marks called query markers 
are allowed to be referents of concept boxes. The resulting graphs are termed 
query graph with cuts ( QGwCs). The right graph of Fig. 2 is therefore a QGwC. It 
describes the relation of all pairs of objects (oi, 02 ), which can replace the query 
markers ?1 and ?2, respectively, such that we obtain a valid concept graph (the 
concept name T denotes the universal concept which contains every object -of 
the respective universe of discourse- in its extension). 

Pollandt and Wille focus on operations on relation graphs, that is, they are 
interested in the algebra of relations. In contrast to that, we consider derivations 
on graphs, i.e. , our focus is the logic of relations. This logic will be elaborated in 
the following sections. In the first section, the basic definitions for query graphs 
with cuts are provided. In the next section, we describe a direct extensional 
semantics for query graphs. In Sec. 4, it is investigated how the calculus for 
concept graph with cuts can be extended for query graphs. In Sec. 5, the class of 
query graphs is restricted so that they better fit to the relation graphs of Burch, 
Pollandt and Wille. This requires further investigations on the calculus. Finally, 
an outlook for further research is given. 

2 Basic Definitions for Query Graphs 

As discussed in the introduction, a drawback of conceptual graphs is a lack of 
mathematical preciseness, which leads to ambiguities and flaws in the system 
of conceptual graphs. The purpose of query graphs with cuts is to elaborate 
a diagrammatic system for the mathematical logic of relations. This elaboration 
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is done as usual in mathematical logic, that is: We have to provide a syntax for 
the graphs, a semantics, and a calculus which is sound and complete. Particularly, 
syntax, semantics, and the calculus have to be defined mathematically. 

Not every reader will be familiar with the use of mathematical notions. More- 
over, due to space limitations, it is impossible to provide all definitions, or even 
proofs of the following theorems, in this paper. For this reason, the paper is 
structured as follows: In this section, the necessary mathematical definitions for 
the syntax of QGwCs are given, and it is explained why these definitions cap- 
ture the intuition behind QGwCs, so that readers who are not trained in reading 
mathematical definitions hopefully get an idea how these definitions work. In the 
following sections, mathematical notations are avoided as much as possible. For 
those readers who are interested in the mathematical theory behind this pa- 
per, an extended version of it is provided at the homepage of the author (see 
the remarker after the bibliography) which contains all further definitions and 
proofs. 

We start with the definition of the underlying structures of concept graphs 
with cuts and query graphs with cuts. The examples in Fig. 1 and Fig. 2 show 
that these graphs are ’’networks” of boxes, relation ovals, and cuts. We see that 
relation ovals ’’connect” the boxes (but we have no direct connection of boxes). 
The boxes and relation ovals are ’’grouped” by cuts, i.e., cuts contain boxes and 
relation ovals. Cuts may even contain other cuts, as the last example of Fig. 1 
shows, but tey may not intersect. Besides the boxes, relation ovals, and cuts, it 
is convenient to add the so-called sheet of assertion , i.e., the plane where the 
diagram is written on, as a further element (e.g., this gives us the possibility 
to say that each box, relation oval, or cut is contained by exactly one cut or 
the sheet of assertion) . These conditions will be captured mathematically by the 
following definition. 

Definition 1 (Relational Graphs with Cuts). 

A structure (V) E, is, T, Cut, area) is called a relational graph with cuts 3 if 

1. V, E and Cut are pairwise disjoint, finite sets whose elements are called 
vertices, edges and cuts, respectively, 

2. v : E — > U is a mapping 4 , 

3. T is a single element with T ^ V U E U Cut, called the sheet of assertion, 
and 

j. area : Cut U {T} — > U E U Cut) is a mapping such that? 

a) ci ^ C 2 => area(ci) G area(c. 2 ) = 0 , 

b) VUEUCut = LUc-uiu{T} area(d), 

c) c (fi area n (c) for each c £ Cut U {T} and n £ N (with area°(c) := {c} 
and area n+1 (c ) := [j{area(d) | d £ area n (c)}J. 

3 Please do not mistake relation graphs and relational graphs. 

4 We set N := {1, 2, 3, . . .} and No := N U {0}. 

The sign U denotes the disjoint union. 
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For an edge e £ E with i/(e) = (ui, . . . ,Vk ) we set |e| := k and Ke)| ?; := Vi- 
Sometimes, we also write ej i instead of z/(e)| i( and e = (vi, . . . , Vk) instead of 
u(e) = (vi , . . . , Vk)- We set E ^ := {e £ E \ \e\ = k}. 

As for every x £ V U E U Cut we have exactly one context c £ Cut U {T} 
with x £ area(c), we can write c = area^ 1 {x) for every x £ area(c), or even 
more simple and suggestive: c = cut(x). 

The sets of boxes, relation ovals, and cuts, are mathematically modelled 
by V, E and Cut , respectively, and the (graphical) sheet of assertion is math- 
ematically modelled by a single element which is named sheet of assertion as 
well. Boxes are linked to relation ovals: This is modelled by the mapping v. 
This mapping can be understood as follows: If we have an edge e with j/(e) = 
(v \ , V 2 , ■ ■ ■ , v n ), then the relation oval corresponding to e is linked to the n boxes 6 
which correspond to v\, . . . , v n , respectively. Finally, for a cut c, the set area(c) 
contains the vertices (boxes), edges (relation ovals) and cuts which are directly 
contained by c. We have to add some restrictions to the mapping area. For ex- 
ample, we have seen above that cuts must not overlap. This is mathematically 
captured by condition a) for area. 

Def. 1 is an abstract definition of graphs which does not try to capture any 
graphical properties of the diagrams. Instead, the diagrams have to be under- 
stood as graphical representations of the graphs (a discussion of the distinction 
between graphs and their representations can be found in [4] and [10]). An ex- 
ample for a relational graph with cuts and its representation will be provided 
after the next definition. 

In contrast to linear notations of logic, there is no need to define the graphs 
inductively. Nonetheless, similar to formulas, relational graphs bear a inner struc- 
ture. A context c of a relational graph with cuts may contain other cuts d in its 
area (i.e. d £ area(c)), which in turn may contain further cuts, etc. It has to 
be expected that this idea induces an order < on the contexts which should be 
a tree, having the sheet of assertion T as greatest element. This order < has to 
be defined now. 

If c is a context and x £ area(c) is a vertex, edge, or cut, we say that x is 
directly enclosed by c. If we have a chain of contexts c = Ci, C 2 , . . . , c n such that 
each Ci is directly enclosed by its successor Cj+i, and we have x £ area(c n ), we 
say that x is enclosed by c. Of course, if x is directly enclosed by a context c, 
then it is enclosed by c as well. In the following, for two contexts c, d we write 
c < d if c is enclosed by d, and c < d if c < d or c = d. The use of the symbol 
“<’ can be justified with the next lemma, which states that < is a order on the 
set of contexts. 

So far, we have only defined c < d for contexts c, d , but it is reasonable to 
extend this definition to the vertices and edges as well. Due to technical reasons, 
a vertex or an edge x is identified with the context c which directly encloses x, 

6 Mathematical trained readers will note that this is not totally correct: We do not 
assume that vi,. . . ,v n are pairwise different, thus we may have less than n boxes. 
The explanation is for readers who are not familiar with mathematical notion. 
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that is, with c = cut(x). For example, if v is a vertex and c is a context, we will 
write v < c if we have cut(v) < c. 

Finally, for an element x , et n be the number of cuts which enclose x. If n 
is even, x is said to be evenly enclosed , otherwise x is said to be oddly enclosed. 
The sheet of assertion T and each oddly enclosed cut is called a positive context, 
and each an evenly enclosed cut is called negative context. 

As it has been shown in [3], we get the following lemma: 

Lemma 1. For a relational graph with cuts (V,E,v,T , Cut, area), < is a qua- 
siorder. Furthermore, < an or der on Cut U {T } which is a tree with 

the sheet of assertion T as greatest element. 

The ordered set of contexts (Cut U {T} , <) can be considered to be the 
‘skeleton’ of a relational graph. For linear notions of logic, where the well-formed 
formulas are defined inductively, many proofs are carried out inductively over the 
construction of formulas. Although graphs are not defined inductively, Lem. 1 
now allows us to do inductive definitions and proofs as well. 

Of course the preceding lemma is not surprising: It had to be expected. But 
as the results, which are clear from a naive understanding of concept graphs, can 
be proven, the lemma indicates that Def. 1 and the definition of < are ‘correct’ 
mathematization of the underlying structure of query graphs with cuts. 

The following figure provides a simple example for the last definitions. In 
the first two lines, a relational graph with cuts © is defined (of course, {vi,V 2 }, 
{ee 2 } ,{ci,C 2 }, {T} are disjoint sets with v\ ^ V 2 , e\ ^ C 2 and c\ ^ C 2 )). Below, 
a graphical representation of © and of its order < is depicted. 

© := ({ui, v 2 }, {ei, e 2 }, {(ei, (ui)), (e 2 , («i, u 2 ))}, T, {ci, c 2 , c 3 }, 

{ ( T ? {Cl}), (ci, {ui, c 2 , C 3 }), (c 2 , {ei}), (c 3 , {v 2 , e 2 })}) 

T 



Cl 



c 3 , V 2 , e 2 

As this example shows, the vertices are usually drawn as boxes, and edges are 
drawn as ovals. For an edge e = (iq, . . . ,v n ), each concept box of the incident 
vertices Vi, ... ,v n is connected by a line to the relation oval of e. These lines 
are numbered 1, . . . ,n. If it cannot be misunderstood, this numbering is often 
omitted. There may be graphs such that its lines cannot be drawn without their 
crossing one another. In order to distinguish such lines from each other, Peirce 
introduced a device he called a ‘bridge’ or ‘frog’ (see [19], p. 55). But, except for 
bridges between lines, all the boxes, ovals, and lines of a graph must not intersect. 
Finally, a cut is drawn as a closed curve (usually an oval) which exactly contains 
in its inner space all the concept boxes, ovals, and curves of the vertices, edges, 
and other cuts, resp., which the cut encloses (not necessarily directly). In order 
to distinguish the curves of cuts from relation ovals, they are drawn bold. 
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In our example, the cut ci is directly enclosed by the sheet of assertion T, 
the cuts C 2 , C 3 and the vertex v± are directly enclosed by the cut ci, the edge ei 
is directly enclosed by the cut C 2 , and V 2 and e 2 are directly enclosed by the 
cut C3. 

The following graph represents another relational graph with cuts: 



In this graph, we have an edge with a incident and deeper nested vertex. In the 
semantics for QGwCs, it will turn out that graphs with this property may cause 
troubles (we will come back to this point in Sec. 3). Thus, we have to forbid 
graphs of this kind. This is captured by the following definition: 

Definition 2 (Dominating Nodes). 

If cut(e ) < cut(v) (o e < v) for every e £ E and v £ V e , then 0 is said to 
have dominating nodes. 

Now QGwCc are be obtained from relational graphs by additionally labelling 
the vertices and edges with names for objects, concepts, and relations. We first 
define the underlying alphabet for our graphs, then QGwCs are defined. 

Definition 3 (Alphabet). 

An alphabet is a triple A := {Q,C,1Z) of disjoint sets Q , C, 1Z such that 

— Q is a finite set whose elements are called object names , 7 

— (C,<c) is a finite ordered set with a greatest element T whose elements are 
called concept names, and 

— is a family of finite ordered sets (7Zk,<n k )> k = 1 (for an 

n £ N) whose elements are called relation names. Let =£ IZ 2 be a special 
name which is called identity. 

On Q U {*} we define an order <g such that * is the greatest element Q U {*}, 
but all elements of Q are incomparable. 

Definition 4 (Query Graphs with Cuts). 

A structure 0 := (V, E,u,T , Cut, area, k, p) is called query graph with cuts 
over the alphabet A, when 

— (V, E,u,T, Cut, area) is a relational graph with cuts that has dominating 
nodes, 

— k:FU£->CUK is a mapping such that k(V) C C, k (E) C 7 Z, and all 
e £ E with |e| = k satisfy n(e) £ IZk, and 

— p : V — >f7U{*}U{?i|*sN} is a mapping such that there exists a natural 
number ar(0) £ No with {i\3v £ V with p(v) =?*} = {1, . . . , ar(©)}. The 
number ar(0) is called the arity of 0 . 

7 The letter Q stands for the German word ‘Gegenstande’, i.e., ‘objects’. This letter 
will recur when we define formal contexts where we have a set G of objects. 
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If ar(0) = 0, then © is called concept graph with cuts over the alphabet A. 

For the set E of edges, let E ld := {e £ E \ n(e) = = } and E nonld := {e € E \ 
tc(e) y^ = }. The elements of E ld are called identity- links. 



In the following, the alphabet is considered to be fixed, thus we simply speak 
of ’query graphs with cuts’. As already done, the terms ‘concepts graph with 
cuts’ and ‘query graph with cuts’ will be abbreviated by CGwC and QGwC, 
respectively. 

For the graphical representation of QGwCs, the underlying relational graph 
is drawn as explained above. Now, inside the rectangle for a vertex v, we write 
first the concept name k(v) and then the referent p(v), separated by a colon. As 
already done, these rectangles are called concept boxes (this graphical notation 
is used in continuous text, too, e.g. we will write ‘let v := P : g ’ instead of ‘let v 
be a vertex with k(v) = P € C and p{v) = g £ Q'). Analogously, for an edge e, 
we write its relation name At(e) into the representing oval. These ovals are called 
relation ovals. 



3 Contextual Semantics 

For the most kinds of mathematical logic, a semantics in form of extensional 
models is provided. Particularly, for first order logic, the extensional models 
are relational structures M := ([/, /), consisting of a universe (of discourse) U 
and a function /, which assigns objects, relations and functions in U to the 
object-, relation- or function-names of the alphabet. If mathematical logic is 
done with diagrams, there is often no direct extensional semantics provided (see 
for example [21, 31]). Instead, a translation from the graphs to first order logic 
is given, so that the models of first order logic serve indirectly as models for the 
graphs as well. 

Formulas and graphs are very different ’styles’ of logic, thus it seems a little 
bit awkward and unappropiate that the semantics, i.e. , meaning, of graphs can 
only be gained indirectly via first order logic. Therefore, this approach is not 
adopted here, but a direct semantics for graphs is provided. This will be done 
in the following subsections. 

3.1 Contextual Models 

The semantics used here is a so-called contextual semantics , which is based on 
Formal Concept Analysis (FCA). This semantics was introduced by Wille in [28], 
and a comprehensive mathematical elaboration of FCA be found in [7] . The ba- 
sic structure of FCA are formal contexts. Roughly speaking, a formal context 
IK is a cross-table, fixing a set of objects G and a set of attributes M, and an 
incidence-relation I between these sets, indicating that an object g has an at- 
tribute m. In order to describe relations between objects, so-called power context 
families (PCFs) are introduced. A PCF is a family (Ko, Ki, K 2 , . . . , K n ) of for- 
mal contexts such that the objects in the context IQ, * > 1 are tuples of the 
objects of Ko. 
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An example of a power context family 



In Fig. 3, an example of a PCF is depicted. It describes the working group 
of the author. The objects are the members of the working group (e.g., ‘RW’ 
stands for ‘Rudolf Wille’, the inventor of FCA and the advisor of the author, and 
‘FD’ stands for the author himself). The meaning of the attributes is obvious 
(the attribute T in Ko is used for the universal concept, which has already been 
mentioned in the introduction, the attribute = in K .2 is the identity). 

In the mathematical elaboration of contextual semantics, the object-, 
concept- and relation-names are interpreted in PCFs, i.e., they are mapped to 
objects, concepts and relation-concepts, respectively. This yields so-called con- 
textual structures. These structures, and the underlying terms like ‘concept’ or 
‘relation-concept’ shall not be discussed here. Again, this can be found in the 
extended version of this paper. In the context of this paper, the following - 
simplified- understanding is sufficient: The object-names are the objects of Ko, 
the concept names are the attributes of Ko, and the relation-names of arity n 
are the attributes in K n . For this reason, we identify PCFs and contextual struc- 
tures. We use the letter M to denote contextual structures. 
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3.2 Evaluation of Graphs in Contextual Models 

The evaluation of a graph in a contextual structure shall first be described with 
some examples. Consider ©i of the following two graphs: 



®i := 



— 



PROF:* 



/ 



| — ( male) 



©2 := 



T : * ^ - (Coauthor oF p=- j T : * 

=-(Coauthor of}^ 

^ y 



The evaluation of ©i starts on the sheet of assertion T and proceeds inwardly 
(this is the so-called endoporeutic method of Peirce for existential graphs). As 
only the outermost cut (let us call it ci) is directly enclosed by T, we know that 
© is true if the subgraph which is enclosed by ci is false. This subgraph contains 
a vertex v and a further cut C2. We now have: © is true if it is not true that 
there exists an object o such that o is a professor and the proposition enclosed 
by C2 is false. Now we have to evaluate the area of C2. This area contains only 
one edge, and the unary relation of this edge refers to the object o. Hence © 
is true if there is no professor such that this professor is not male. In simpler 
words: Every professor is male. This proposition is true in our given contextual 
structure. 

Similarly, the meaning of the right graph is ‘it is not true that there are two 
objects oi, 02 such that cq is the 8 advisor of of 02, but 02 is not a co-author 
of 01’. In other words: Each advisor of a person is a co-author of that person as 
well. In our given contextual structure, this proposition is false. 

Now it can be explained why we forced the graphs to have dominating nodes. 
Consider the next two graphs, where the right graph has no dominating nodes: 



PROF:* 



PROF:* 



male) 



The meaning of the left graph is clear: ‘It is not true that there is a professor’. 
Particularly, the generic marker is existentially quantified (‘there is’), and this 
quantification takes places inside the cut. But now, if we try to read the right 
graph inwardly, we have to evaluate the edge labelled with ‘male’, which refers 
to the object of the concept box inside the cut. Therefore, the ‘place’ of the 
existential quantification moves outside the cut, i.e. , scope of the generic marker 
has to be extended to the sheet of assertion. This is possible, but it makes the 
reading of CGwCs very complicated. Therefore, graphs like this are not allowed. 
(A more comprehensive discussion of dominating nodes can be found in [ 3 ].) 

CGwCs are a formalization of propositions. For a given contextual structure, 
this proposition is false or true. As usual, we write M \= ©, if 0 evaluates to 
TRUE in the contextual structure A4. E.G., if M denotes our given contextual 
structure, we have M |= ©1 and M ©2. Moreover, for two graphs © a , ©f,, we 

8 More precisely, we should write ‘an advisor’ instead of ‘the advisor’ 
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write ©o |= ©b, if we have M \= 0 a => M |= ©b for each model, i.e., ©b is true 
in every model where © 0 is true. 

The semantics which has been exemplified so far is the semantics for CGwCs, 
as it is described in [3]. This semantics can naturally be extended for QGwCs. 
Consider the following two QGwCs: 



( female )-| T: ?1 |— ^(Coauthor of)-^ — | T : FD 



^-( Advisor of)-^- 



?! 


f \ 


T: 




-^Coauthor of)— 

V J 





In contrast to CGwCs, QGwCs are not a formalization of propositions, but their 
evaluation in a contextual structure model yield relations. The idea is simple: For 
a given contextual structure, a QGwC of arity n (i.e., it contains query markers 
?1, . . . , In) describes the relation of all n-tuples (oi, . . . , o n ) of objects which can 
replace the query markers ?1, . . . ,?n, respectively, such that we obtain a valid 
concept graph for M. 

The first graph can be understood to be the following query: ‘Give me all 
persons 9 which are female and which are a co-author of FD’, or ‘give me all 
female co-authors of FD’ for short. In our example, we obtain only one person, 
namely JK. 

The second graph can be understood to be the following query: ‘Give me all 
pairs of persons such that the first person is the advisor, but not a co-author, 
of the second person.’ In our example, we obtain the following set of pairs: 
{( RW,JK ), ( RW : TK ), (RW, BW)}. 

In the following, for a contextual structure A4 and a QGwC ©, the relation 
obtained by evaluating 0 in M shall be denoted by 91m, &■ Note that a notion 
like ‘A4 \= ©’ is meaningless. But if we have two QGwCs 0 a ,©b, we write 
©a |=n ©b if we have 91m, < 3 a 2 91m, 0 a f° r each model M, that is, © a describes 
a ‘bigger’ relation than ©b. 

It should be noted that this definition extends the definition given in [3] for 
concept graphs with cuts, which are evaluated to one of the truth-values TRUE 
and FALSE in models: If © is a concept graph with cuts, then 91m, e is one of 
the 0-ary relations {} and {{}}. The relation {} can be interpreted as FALSE, 
the relation {{}} can be interpreted as TRUE. Then 

M f= © 9\ m ,& = {{}} and © a \= © b <*=>• © Q (= 0 © b 

for concept graphs ©, © a , ©b and contextual models M. 



4 Calculus 

In [3], a sound and complete calculus for concept graphs with cuts is provided. 
This calculus is a based on Peirce’s calculus for the beta part of existential 
graphs, which is here extended in order to capture the syntactical differences and 

9 More formally, we should write ‘object’ instead of ‘person’. 
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the higher expressiveness of concept graph with cuts. As we can nearly adopt 
this calculus for QGwCs, we repeat it here, using common spoken language. For 
the mathematical definitions of the rules, we refer to [3]. 

Definition 5 (Calculus for Concept Graphs with Cuts). 

The calculus for concept graphs with cuts over the alphabet A := (Q,C,IZ) 
consists of the following rules: 



— Erasure 

In positive contexts, any directly enclosed edge, isolated vertex, and closed 
subgraph may be erased. 

— Insertion 

In negative contexts, any directly enclosed edge, isolated vertex, and closed 
subgraph may be inserted. 

— Iteration 

Let 0o := (Vo, Eq, uq, To, Cuto, areao, kq, Po) be a (not necessarily closed) 
subgraph of 0 and let c < cuf(0o) be a context such that c (j Cuto. Then 
a copy of 0o may be inserted into c. For every vertex v £ Vq with cut(v) = 
cut(<3 o), an identity-link from v to its copy may be inserted. 

— Deiteration 

If 0o is a subgraph of 0 which could have been inserted by rule of iteration, 
then it may be erased. 

— Double Cuts 

Double cuts (two cuts ci,C2 with area(ci) = {02}) may be inserted or erased. 

— Generalization 

For evenly enclosed vertices and edges, their concept names or object names 
resp. their relation names may be generalized. 

— Specialization 

For oddly enclosed vertices and edges, their concept names or object names 
resp. their relation names may be specialized. 

— Isomorphism 

A graph may be substituted by an isomorphic copy of itself. 

— Exchanging Referents 

Let e £ E ld be an identity link with ^(e^) = g\, p(e | ) = g 2, 51,52 £ Q U{*} 
and cut(e ) = cut{e\ l ) = cuf(e| 2 ). Then the referents of v 1 and V2 may be 
exchanged, i.e., the following may be done: We can set p(eL) = 52 and 
P(e | 2 ) = 51- 



Merging Two Vertices 

Let e £ E ld be an identity link with v(e) = (vi,V2) such that cut(v\) > 
cut(e) = cut{v 2), p(v 1) = p(v 2) and n[v2) = T hold. Then V\ may be merged 
into V2, i.e., v\ and e are erased and, for every edge e £ E, e\. = v\ is 
replaced by e\ i = V2- 

Splitting a Vertex 

Let g £ Q U {*}. Let v = 



P 



5 



with relation edges R \ , . . . , 
Let c be a context such that 



be a vertex in the context Co and incident 
R n , placed in contexts c\,...,c n , respectively. 
ci, . . . , c n < c < cq. Then the following may be 
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done: In c, a new vertex v' = T : g and a new identity-link between v and v' 
is inserted. On R±, . . . , R n , arbitrary occurences of v are substituted by v' . 

T -Erasure 

For g £ Q U {*}, an isolated vertex T : g 
contexts. 

T -Insertion 

For g £ Q U {*}, an isolated vertex T : g 
contexts. 

Identity-Erasure 

Let g £ Q, let V\ = P± : g and V2 = P 2 ■ g 



may be erased from arbitrary 



may be inserted in arbitrary 



be two vertices. Then any 



identity-link between v\ and V 2 may be erased. 

Identity-Insertion 

Let g £ Q, let v\ = P\ : g , V 2 = P 2 ■ g be two vertices in contexts c\, C 2 , 



resp. and let c < c 1 , C 2 be a context. Then an identity-link between v\ and V 2 
may be inserted into c. 



A proof for two graphs 0 O , ©b is defined as usual in logic, i.e., it is a sequence 
of graphs, starting with © a , ending with ®b, where each graph of the sequence 
is derived from its predecessor by one of the rules of the calculus. As usual, this 
will be denoted © a b ©b- 

The question arises how the calculus for CGwCs can be extended to a calculus 
for QGwCs. The basic idea is that query markers can be interpreted as ’generic 
object names’. Thus it has to be expected that, if we treat the query markers 
like object names, we get an adequate calculus for QGwCs. The definition of the 
calculus is as follows: 



Definition 6 (Calculus for Query Graphs with Cuts). 

The calculus for QGwCs consists of the rules of the calculus for concept graph 
with cuts (Def. 5), where 

1. the query markers ?i are treated like object names, and 

2. an application of a rule to a QGwC 0 a := 

(V a ,E a , z/ a , T a , Cut a , area a , n a , p a ) with arity n is only allowed if it preserves 
the arity, i.e., for the derived graph Q5b '■= (Vb, Eb,Vb,P b,Cutb, areab, Kb, Pb) 
we have 



{? | £ V a with p(v) =?i} = {* | £ Vb with p{y) =?*} = {1, . . . , n} 

If 0 a , ©b are two QGwCs with arity n such that ©b is derived from © a with this 
calculus, we write © a b n ©b- 

In fact, it can be shown that this calculus is complete (again, the proof for 
this and the following proofs is omitted here, but can be found in the extended 
version of this paper). The next theorem, which states the completeness, is the 
first main result of this paper. 

Theorem 1 (The Calculus for QGwCs is Complete). 

The calculus h n is complete. 
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5 Normed Query Graphs 



The relation graphs, as they have been described in the introduction, have some 
simple syntactical restrictions which are not adopted for query graphs. If a re- 
lation graph describes a relation of arity n , it has exactly n pending edges (one 
edge for each unsaturated valence of the relation), and these pending edges end 
on the sheet of assertion. In contrast to that, in a QGwC, a query marker li 
may occur in an arbitrary number of concept boxes, each of them placed in an 
arbitrary context. In the following, we restrict the system of QGwCs in order 
to get a class of graphs which corresponds more closely to relation graphs. That 
is, we consider QGwCs where each query marker li appears only once, namely 
in a concept box 



T : li 



placed directly on the sheet of assertion. These graphs 
are called normed QGwCs. 

We have provided a sound and complete calculus for -not necessarily normed - 
QGwCs. It is not obvious whether this calculus is still complete if we restrict it 
to the class of normed query graphs. In fact, the rules ’splitting a vertex’ and 
’merging two vertices’ have to be slightly extended. Usually, if a vertex v-y := 
is split, a new vertex V2 '■= 



P 



9 



T 



9 



is inserted. This is captured by the 



P :!i 



condition p(v i) = p(v 2) in Def. 5 . Note that if we split a query vertex 
with this form of the rule ‘splitting a vertex’, the derived graph contains (at 
least) two vertices with the referent li, hence the derived graph is not normed. 
Thus, in the class of normed QGwCs, this rule can never be applied. In order to 
make this rule and the rule ‘merging two vertices’ usable for the class of normed 
QGwCs, the condition p(v 1) = p(v 2) is weakened in both rules to p(v 1) < p(v 2). 
That is, we set: 



Merging Two Vertices (Extended Version) 

Let e £ E ld be an identity link with v(e) = (vy,V2) such that cut{v 1) > 
cut(e) = cut{v 2), p(v 1) < p(v 2) and n{v2) = T hold. Then vy may be merged 



into i>2, i.e., vy and e are erased and, for every edge e £ 
replaced by e\ i = V2- 

Splitting a Vertex (Extended Version) 

Let g £ Q U {*}. Let v = 



E, 



= Vy is 



P 



9 



be a vertex in the context Cq and incident 
with relation edges Ry , . . . , R n , placed in contexts ci, . . . , c n , respectively. 
Let c be a context such that c\,...,c n < c < Co- Then the following may 
be done: In c, a new vertex v' = T : g' with g' > g and a new identity- link 
between v and v' is inserted. On Ri, , R n , arbitrary occurences of v are 
substituted by v' . 



The calculus obtained from b„ with the old rules ‘merging two vertices/splitting 
a vertex’ replaced by its extended versions shall be denoted by b n . 

The generalized rules can be derived from the calculus b ra , thus the new 
calculus is still sound. Again, this proof is omitted, but it can be found in the 
extended version of this paper. 
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It remains to show that is a complete calculus for normed QGwCs. In 
order to show this, we first assign to each QGwC 0 a normed QGwC norm(05 ) 
as follows: 



1. For each i = 1, . . . , ar(0), we add a new vertex v?j := T :?* to the sheet of 
assertion. 

2. Then, for each vertex v ^ u?i with p(u) =?i, an identity link between 
and u is added. 

3. Finally, for each vertex v ^ v-u with p(v) =?i, its reference li is replaced by 
the generic marker *. 



Example. 

In the example below, the right graph is the normalization of the left graph. 



Q2: ?2 K R bfP^gl P3: ?3 P2: ?1 



f 


\ 1 


Ql: ??jK R- V|T:* I IT S )M PI: ?1 1 


V 


y 



Q2: * HR HP: g P3:* P2: * 




The next lemma shows that 0 and norm(&) are provably equivalent. But 
even if 0 is a normed QGwC, we do not have © = norm(05). Nonetheless, is is 
easy to see that norro(0) can be derived from 0 by a simple application of the 
extended ‘splitting a vertex’ rule. Thus, for a normed QGwC 0 with ar(0) = n, 
we have 0 norm(Q5) and norm(Q5) H n 0. 

Lemma 2 (© and norm(<8 ) are Equivalent). 

Let © be a QGwC. Then norm(Q5 ) is a normed QGwC which is syntactically 
equivalent (with \~ n or b n ) to 0. 

As the graphs 0 and norm(Q5) are syntactically, and hence, semantically, 
equivalent, the class of normed QGwCs has the same expressiveness as the class 
of QGwCs. 

Now let 0 O and 05 b be two normed QGwCs of arity n with 0 O |= ©t,. As b n 
is complete, we have 0 Q b n ©& as well. The proof for 0 a b n 0f, is a sequence 
of graphs, but this sequence is a sequence of QGwCs which are not necessarily 
normed. But the proof can be transformed into a proof in the class of normed 
QGwCs, together with the calculus H„. Let us consider an example, where the 
iteration-rule of \~ n is applied to a QGwC (the iterated subgraph 0 O is marked 
by the dashed line): 
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®a := 



Q: ?2 



P: ?1 



LQ^ 




We cannot derive norm(<&b) from norm (& a ) by a simple application of the 
iteration-rule, but in the class of normed QGwCs, we can construct a proof for 
norm(Q5 a ) i~n norm(<3b ), which is as follows: 

We start with ?rorm(0 a ): 

|T:_?2| 








The query vertices are split such 
that their copies -we will call 
them Wi~ are placed in the con- 
text where 0q is iterated into. 




^ T:j-J O5DH Q : * 






^ 


4 



In the derived graph, we have a 
subgraph which corresponds 0o 
(it is marked with dashed lines 
in the diagram above). Particu- 
larly, this subgraph contains the 
vertices 'Wj . . Now this subgraph is 
iterated, and we insert an iden- 
tity link from each each Wi to its 
copy. 



Now the copies of the vertices Wi 
are merged back into their ori- 
gins Wi. 
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Finally, the vertices Wi are 
merged back into the query ver- 
tices. This step yields norm(f3b). 



T: ?1| 

The underlying idea of this proof can be carried over to all rules in the 
calculus b n . This yields the following theorem: 

Theorem 2 (Transformation of \- n into H-„). 

Let © a , ©f, be two QGwCs such that ©f, is derived from © a by applying one 
of the rules of the calculus \~ n . Then we have norm(& a ) H ~ n norm(&b), where 
the proof contains only normed QGwCs. 

Again, this proof can be found in the extended version of this paper. 

With the preceding theorem, it follows immediately that IF„ is complete. This 
is the second main result of this paper. 

Corollary 1 (Completeness of for Normed QGwCs). 

Let © a ,©f, be two QGwCs with © a h n (3b ■ Then we have norm(f3 a ) 
b n norm(&b), where the proof contains only normed QGwCs. Particularly, the 
calculus f- ra is complete for normed query graphs. 

6 Conclusion and Further Research 

There are two viewpoints for relation graphs: From an algebraic point of view, 
operations on graphs, corresponding to operations on relations, have to be in- 
vestigated. This has been done by Burch, Pollandt and Wille. From a logical 
point of view, inference rules have to be investigated. This is the purpose of this 
paper. Of course these viewpoints are not competing, but complementing. To 
make the results of this paper fruitful for the theory of existential graphs and 
relation graphs, the relationships between existential graphs and concept graphs 
resp. between relation graphs and query graphs have to be further elaborated. 
A first approach for existential graphs can be found in [2]. 

Relation graphs can easily be defined inductively. So it seems appropriate 
to provide an inductive definition for query graphs as well and to discuss the 
advantages and disadvantages of non-inductive and inductive definitions. This is 
particularly important for logicians who are much more familiar with inductive 
definitions and their use in many proofs, e.g., for formulas. 

The system of query graphs can be syntactically extended. A crucial exten- 
sion is the addition of so-called nestings , where whole subgraphs of a graph are 
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enclosed in a vertex. There are different possibilities for interpreting nestings. 
They are often used to describe specific contexts, e.g., situations. In [5], nest- 
ings are used to describe nested relations which occur in form of so-called set 
functions in database systems. The implementation of nestings has to be further 
investigated. 
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Abstract. Constraint diagrams are a diagrammatic notation which may 
be used to express logical constraints. They were designed to complement 
the Unified Modeling Language in the development of software systems. 
They generalize Venn diagrams and Euler circles, and include facilities 
for quantification and navigation of relations. Due to the lack of a lin- 
ear ordering of symbols inherent in a diagrammatic language which ex- 
presses logical statements, some constraint diagrams have more than one 
intuitive meaning. We generalize, from an example based approach, to 
suggest a default reading for constraint diagrams. This reading is usually 
unique, but may require a small number of simple user choices. 

Keywords: Visual formalisms, diagrammatic reasoning, software spec- 
ification, formal methods, constraint diagrams 



1 Introduction 

The Unified Modeling Language (UML) [ 2] is the Object Management Group’s 
industrial standard for software and system modelling. It has accelerated the 
uptake of diagrammatic notations for designing systems in the software industry. 

In this paper, we are concerned with a diagrammatic notation, constraint 
diagrams , which may be used to express logical constraints, such as invariants 
and operation preconditions and postconditions, in object-oriented modelling. 
It was introduced in [10] for use in conjunction with UML. In [1], progress was 
made towards a more diagrammatic version of the Object Constraint Language 
(OCL), which is essentially a textual, stylised form of first order predicate logic. 
Constraint diagrams could be used as a possible alternate substitute. Further- 
more, this notation is compared with a Z-based notation as a modelling language 
in its own right, in [9]. 

Constraint diagrams were developed to enhance the visualization of object 
structures. Class diagrams show relationships between objects, as associations 
between classes, for example. Annotating, with cardinalities and aggregation, for 
example, enables one to exhibit some properties of these relationships between 
objects. However, frequently one wishes to exhibit more subtle properties, such 
as those of composite relations. This is impossible using class diagrams, but the 
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inherent visual structure of constraint diagrams makes this, and many other 
constructions, easy to express. 

Constraint diagrams build on a long history of using diagrams to visualize 
logical or set-theoretical assertions. They generalize Venn diagrams [15] and Eu- 
ler circles [2], which are currently rich research topics, particularly as the basis 
of visual formalisms and diagrammatic reasoning systems [13, 6, 7, 8, 14]. Con- 
straint diagrams are considerably more expressive than these systems because 
they can express relations, whilst still retaining the elegance of the underlying 
diagrammatic systems. For constraint diagrams to be used effectively in software 
development, it is necessary to have strong tool support. Such tools are currently 
under development [11, 5]. 

In [3], we described a ‘reading’ algorithm which produces a unique semantic 
reading for a constraint diagram, provided we place a certain type of partial 
order (represented by a reading tree) on syntactic elements of the diagram. This 
process begins with the construction of a unique, partially directed graph from 
the diagram. This dependence graph describes the dependences between certain 
syntactic elements of the diagram. Then, information from this graph allows 
one to construct reading trees. The constraint diagram, together with a single 
reading tree, determines a unique semantic interpretation, using a model-based 
approach in first order predicate logic. Models for a diagram are, essentially, 
assignments of sets to contours, of relations to arrows and of elements to spiders, 
which respect the constraints imposed by the logical formula. In this paper we 
just give the logical formula, and call it the reading of the diagram. See [3] for 
more details. In [4], a ‘tree-construction’ algorithm is described, which produces 
all possible reading trees, given a dependence graph, and hence all readings for 
a given diagram. 

Extending the syntax of constraint diagrams to include a reading tree is one 
possibility, which may be appropriate for an advanced user who wishes to be 
able to express complex logical constraints. However, we wish to keep the basic 
syntax of the notation as simple as possible in order to facilitate its learning 
by a new user. Thus we seek a default reading of a diagram. This corresponds 
to a particular choice of reading tree, which is chosen automatically due to 
properties of the diagrams themselves, and so the user does not need to see 
a dependence graph or a reading tree at all. In this paper, we make progress 
towards a default reading, firstly by placing a sensible restriction on the reading 
trees and showing that a total ordering of the set of spiders in each “dependent 
piece” (connected component of the dependence graph) is sufficient information 
to produce a sensible unique reading of any diagram. Secondly we investigate 
the possibility of applying criteria which are, in the opinion of the authors, an 
intuitive set of choices which indicate how to read a diagram. These choices 
further reduce the number of available readings of a diagram and often produce 
a unique reading. In the future, usability studies will be performed on this default 
reading to assess its suitability. 

In Sect. 2 we give a concise description of constraint diagrams and the “in- 
tended semantics” of individual pieces of syntax. In Sect. 3 we outline some of 
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the ambiguities that may arise in choosing how to combine these in order to 
give semantics to a diagram. Possible resolutions of these ambiguities are given. 
Then, in Sect. 4, we give examples encapsulating the notions of dependence be- 
tween certain syntactic elements of the diagram. This dependence corresponds 
to the relevant pieces of intended semantic statements requiring reference to 
each other. A complete description of the dependences can be found in [4]. This 
enables us to associate a unique dependence graph to a diagram. In Sect. 4.1, 
we show how a total order on the set of spiders in each connected components 
of the dependence graph of a diagram gives rise to a unique semantic interpre- 
tation of the diagram. In Sect. 5, we develop a formalism which enables one to 
construct our default reading of a diagram. Essentially, one uses the structure 
of the arrows in the diagram to define an ordering on the spiders which gives 
rise to a reading of the diagram. Finally, in Sect. 6, we highlight the usefulness 
of this work and indicate possibilities for further investigation. 

2 Constraint Diagrams 

A contour is a simple closed curve in the plane. The area of the plane which 
constitutes the whole diagram is a basic region. Furthermore, the bounded area 
of the plane enclosed by a contour c is the basic region of c. A region is defined 
recursively: a basic region is a region, and any non-empty union, intersection, 
or difference of regions is a region. A zone is a region which contains no other 
region. A zone may be shaded. A region is shaded if it is a union of shaded zones. 

A spider is a tree with nodes (called feet ) in distinct zones. It touches any 
region which contains (at least) one of its feet. The union of zones that a spider 
touches is called the spider’s habitat. A spider is either an existential spider, 
whose feet are drawn as dots, or a universal spider, whose feet are drawn as 
asterisks. 

The source of a labelled arrow may be a contour or a spider. The target 
of a labelled arrow may be a contour or a spider. A contour is either a given 
contour, which is labelled, or a derived contour, which is unlabelled and is the 
target of some arrow. 

For example, in Fig. 1, there are two given contours, labelled by A and B, 
and one derived contour. These contours determine the five zones of the diagram. 
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There is a universal spider, labelled by x , in the zone which is “inside A and 
outside £?”. This spider is the source of an arrow labelled /, which targets the 
derived contour. There are two existential spiders in the diagram, one of which 
has two feet and is labelled by s and the other has three feet and is labelled 
by t. The habitat of s is the region “inside B” and the habitat of t is the region 
“outside A ” . 

Given contours represent sets. Intersection, inclusion and exclusion properties 
of contours are respected by the sets. Arrows represent relations. Existential 
spiders assert existence of elements in their containing set (represented by their 
habitat). Universal spiders represent universal quantification. Derived contours 
represent the image of a relation. Distinct spiders represent distinct elements. 
A shaded region with n existential spiders touching it represents a set with no 
more than n elements. 

The diagram in Fig. 1 can be interpreted as “There are two sets A and B. 
There is an element, s, in B and an element t outside A, such that s yf t. Every 
element x in A — B is related, by /, to some set which is outside A and B (and 
may be different for each x)”. 

We will use the standard object-oriented notation x.f to represent textually 
the relational image of element x under relation /, that is, x.f = {y : (x,y) £ /}. 
Thus x.f is the set of all elements related to x under relation /. The expres- 
sion x.f is a navigation expression , so called because we can navigate from x 
along the arrow / to the set x.f. The relational image of a set S is then defined by 

S.f = (J x.f. 

x€S 



3 Reading Ambiguities 

In order to describe the difficulties in assigning semantics to a diagram, we give 
some examples of constraint diagrams, together with possible semantic interpre- 
tations, in the form of logical formulae. These formulae are constructed via the 
reading algorithm in [3] and the tree-construction algorithm in [4]. The general 
idea of the formulae, if not the detail involved, should be apparent from the 
diagram. Note that we have chosen to use square brackets to denote the scope 
of the universal quantifiers in the logical formulae for visual purposes, and that 
A indicates the complement of A. 

In Fig. 2, we can read the universal spider, x, before the existential spider, s, 
to give the first reading, or afterwards to give the second: 

A ns = 0 A Vs e A [ 3s e B ( x.f = s ) ] 

A n -B = 0 A 3s £ i? ( Vx £ A [ x.f = s ] ). 

Choosing a total order on the set of spiders in the diagram would resolve this am- 
biguity. In some instances, we may wish for some collections of syntactic elements 
to represent semantic statements which are independent of the statements aris- 
ing from other collections. For example, in Fig. 3, the given contour, C , “breaks 
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Fig. 2. Ordering of quantifiers 




Fig. 3. Non-nested quantification 




Fig. 4. Reading arrows as soon as they can be 



up the diagram into independent pieces” , since the statements involving x and / 
do not need to reference those involving s and g. 

The quantifiers do not need to be not nested, and the semantics we will wish 
to give as a default are: 

A nB = 0 A AnC = 0 A BnC = 0 A Vx G A [ x.f = C } A 3s G B ( s.g = C ). 

Note that we did not choose to interpret this diagram with the quantifiers nested; 
compare the formula above with the following formula, in the case that A is 
empty: 

A nB = 0 A AnC = 0 A BnC = 0 A Vx G A[x.f = C A 3s e B ( s.g = C ) ]. 

This notion of independent pieces is formalised in the form of dependences be- 
tween certain syntactic elements of the diagram (see Sect. 4). A (special) partial 
order on the set of spiders which are in “independent pieces” of the diagram 
resolves the ambiguity here. 

In fact, the ordering of the arrows as well as of the spiders can have an effect 
on the semantics. For example, in Fig. 4 one could read the elements x, then y, 
then /, and then g 7 say, to give the semantic interpretation: 

A DB = 0 A \/x £ A [My € B [ x.f C A n B A (■ y.g C x.f ) ] ]. 

This is a possible interpretation of the diagram, but has the consequence that 
if B is empty then the relation / is “ignored” in the logical formulae. Reading 
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the elements in the order x , then /, then y and then g , gives the more intuitive 
reading: 



A fl -B = 0 A \/x € A [ x.f C Ad B A \/y € B [ y.g C x.f ] ]. 

We forbid the reading with the awkward side-effects by enforcing that arrows 
are defined as soon as possible. That is, an arrow whose target is not a spider 
is read as soon as its source is defined and an arrow whose target is a spider is 
read as soon as both its source and target are defined. 

This assumption that arrows are defined as soon as possible greatly reduces 
the number of possible readings of a diagram as well as removing readings with 
awkward side-effects. Note that if there is a choice of arrows which can be de- 
fined at a particular time, then choosing any order will give equivalent readings, 
because the associated logical statements are conjoined. 

4 Dependences 

The informal semantics of a constraint diagram consists of a collection of pieces 
of information, some of which are related and need to be ordered. For the formal 
specification of the semantics, we will need to know precisely which diagrammatic 
elements are related to each other and need to be ordered. For example, if two 
spiders’ habitats intersect (they have a foot in the same zone) then we wish to 
say that the elements, s and t, represented by the spiders, are not equal, forcing 
the scope of quantification of s to encompass that of t, or vice versa. The precise, 
formal definitions of dependence criteria, which encapsulate all of the necessary 
information of this type are given in [3]. 

By reference to examples, we describe informally the dependences of the 
spiders and the arrows, which are the relevant syntactic elements of the diagram. 
A graph, called the dependence graph 1 of the diagram, is obtained from this 
information. It has nodes which correspond to each spider and arrow in the 
diagram, and these are connected by edges if there is a dependence between 
the spiders or arrows. In order to illustrate the dependence graphs, they will be 
drawn alongside the diagrams themselves in the following figures. The following 
examples illustrate various types of dependence. 

In Fig. 5, u and s are dependent because they both touch the zone BnAnC. 

In Fig. 6, x and / are dependent because x is the source of /. Similarly 
for x and g. Also s and / are dependent because s is the target of /. There is 
dependence between g and t because the description of the habitat of t involves 
the target of g. 

In Fig. 7, h and g are dependent because they target derived contours which 
are described in terms of each other. Arrows / and g are dependent because 

1 The dependences given here are a slightly simplified version of those given in [3]. 
Thus we obtain an unoriented version of the dependence graph. The orientation 
requirements have been taken into account by asserting that arrows are defined as 
soon as possible. 
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Fig. 5. Spider to spider dependence 





Fig. 6. Spider and arrow dependence 





Fig. 7. Arrow to arrow dependence 



the source of g is the target of /. Finally, i and j are dependent because of the 
shading (which gives rise to the statement C.i f~l B.j = 0). 

4.1 Total Order of Spiders 

Computing the connected components of the dependence graph breaks up the 
diagram into “independent pieces”. For each of these pieces, giving a total order 
on the spiders and arrows in this independent piece gives a reading of that piece 
of the diagram (by writing the quantifiers and statements in a single formula 
where the scope of quantification of each quantifier extends to the end of the 
formula). However, one may also construct a reading, using this ordering, in 
which the scope of the quantifiers is as small as possible. Furthermore, due to 
our assumption that arrows are defined as soon as they can be, a total order on 
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x 



/ f 5 g 







Fig. 8. Restricting the scope of the quantifiers 



the set of spiders in each connected component of the dependence graph gives 
rise to a sensible reading of the diagram. We endeavor to define such a total 
order of spiders (and thus a reading) from the diagram syntax itself and this 
gives our default reading. 

For example, in Fig. 8, consider the total order of the spiders (x, y, t, s ). Due 
to the assumption that arrows are defined as soon as they can be, this gives 
rise to the total order of the spiders and arrows (x,h,y,i,t,s, f,g). Since the 
dependence graph is connected, but deleting x from the dependence graph leaves 
two distinct connected components containing s,f,g and h,y,i,t , respectively, 
we can restrict the scope of the quantifiers as in the following reading: 

A ns = 0 A A nc = 0 A Bn <7 = 0 A 
Vx e A [ ( 3s e B ( x.f — s A s.g = C ) ) 

A ( x.h C A n J3 n (3 A VyS x.h [y.i C C A 3 1 £ y.i] ) ]. 

5 Default Reading 

Given a total order on the set of spiders in each independent piece (connected 
component of the dependence graph) of a diagram, we can give semantics to the 
diagram (without the need for a user to choose a reading tree). Now, we want 
to generate a default reading of a diagram. So we set up a framework which 
allows us to choose an ordering of the spiders from the diagram, possibly with 
some user choice, but often removing the need for user-interaction. Interestingly, 
although the spiders are the syntactic elements that require ordering, it appears 
that the arrows give an indication of how to order them. 

5.1 Arrow Chains 

Arrows in a diagram give an intuitive direction to follow in a reading. We en- 
capsulate a notion of following arrows in a diagram. 

Definition 1. A pair of arrows, ( f,g ), are strongly bound together if T(f), the 
target of f, is a spider or a derived contour which is equal to S(g), the source 
of g. They are weakly bound together if: 
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Fig. 10. A chain of arrows 



— either T(f ) is a derived contour and S(g ) is a spider whose habitat’s de- 
scription involves T(f) or 

— T(f) and S(g) are spiders whose habitats intersect. 

The pair are said to be bound together if they are either strongly or weakly bound 
together. 

In Fig. 9, (/, g) is a strongly bound pair of arrows and (/, h) is a weakly bound 
pair of arrows. Visually, strongly bound pair of arrows are more appealing than 
weakly bound pairs of arrows. 

Definition 2. A chain of arrows is a sequence of arrows (/i, . . . , f n ) such that 
each consecutive pair of arrows are bound together. It is of maximal length if it 
is not a subsequence of any other chain of arrows. 

In Fig. 10, (/, g, h ) is the maximal length chain of arrows. Note that there can 
be more than one maximal length chain of arrows in a diagram and that these 
chains can be of different lengths. In Fig. 9, both of the chains (/, g) and (/, h) 
are of maximal length. 

Definition 3. The set of spiders associated to a chain of arrows are the spiders 
which occur as sources or targets of arrows in the chain. The order of the arrows 
in the chain induces an ordering of these spiders. The starting spider of the 
chain is the spider which appears first in this ordering. 

In Fig. 10, the chain (/, g, h) gives rise to the total order of spiders (x, s, t ), with 
starting spider x. 

In Fig. 11, there is only one arrow chain, (/, g), of maximal length. Note that 
(g, f) is not an arrow chain because the habitat of x is A which does not require 
reference to the derived contour s.g. Using the arrow chain ( f,g ), we obtain the 
total order of spiders, (x, s ), with starting spider x, and the reading: 

A n B = 0 A Vx G A [ x.f C B A 3s G B — x.f ( s.g C A ) ]. 
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Fig. 11. Maximal length arrow chain (/, g) 




Fig. 12. The domain of quantification is not necessarily represented by the habitat of 
the spider 




Fig. 13. Start reading at the spider whose habitat represents the domain of quantifi- 
cation 



5.2 Domains of Quantification 

In order to read certain diagrams one must allow the “domain of quantification” 
to be different from the set represented by the habitat. That is, the set over 
which we quantify may be represented by a contour, for example, containing the 
spider, but not the closest containing contour. 

For example, in the symmetric case of Fig. 12, one must choose whether to 
start reading at x or at y. If one starts at x, say, then the domain of quantification 
of x is the whole set represented by the given contour A , whereas its habitat is 
the derived contour containing x. In this case, the reading is: 

A nf? = 0 A Va : £ A [ x.f C B A Vy G x.f [ y.g C A Ai£ y.g ] }. 
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In Fig. 13, both (/, g) and (g,f) are chains of maximal length. Consider the 
chain (/, g). This gives rise to the total order of spiders (x, s). For each quantifier 
(using this ordering) , the domain of quantification is equal to the set represented 
by the habitat of the corresponding spider. This is not true for the ordering (s, x ) 
given by the chain ( g , /); the habitat of s is then the region “outside A and x.f" , 
but the domain of quantification for s is “outside A ” . The chain (/, g) gives rise 
to the reading: 

A D5 = 0 A Vi S d [ x.f C B A 3s £ A n x.f ( s.g = x ) ]. 



Definition 4. A chain of arrows respects the habitats of its spiders if, when 
the associated total order on the spiders is used to construct the semantics, the 
habitat of each spider represents its domain of quantification. 

In Fig. 12, neither maximal chain respects the habitats of its spiders. In Fig. 13, 
(f,g) respects the habitats of its spiders but (g,f) does not. 

It is also necessary to prioritize any conditions which are used to choose 
a reading of a diagram. For example, in Fig. 14, there are two maximal length 

chains of arrows (f,h) and ( g,h ). Since (f,h) is weakly bound and (g,h) is 

strongly bound, ( g , h) is preferable to (/, h) with regard to this condition. How- 
ever, (/, h) respects the habitats of its spiders, whereas (g, h) does not (in the 
corresponding logical formula, the statement 3s £ B occurs, and the statement 
s £ x.f occurs elsewhere, when the derived contour containing s is defined). Thus 
we assert that respecting the habitats of spiders is a condition which is of higher 
priority than following strongly bound chains in preference to weakly bound 
chains. Using the chain (f,h) gives the order of spiders ( x,s,t ), and therefore 
the reading: 

A DH = 0 A AnC=(/> A BnC = 0 A 

\/x £ A [ x.f C B A 3s £ x.f ( C.g = s A 3 1 £ An B H C ( s.h = t ) ) ]. 
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5.3 Merging Orders 

A chain of arrows gives rise to an order on its associated set of spiders. If there 
are distinct chains of arrows in the diagram which have arrows in common, then 
we may wish to merge the total orders of the spiders together appropriately. 

Definition 5. An ordered pair of spider orderings, (soi,so 2 ), can be merged to 
give a single spider ordering by identifying identical elements in soi and SO 2 , 
and then by inserting the elements of SO 2 into so\ as far to the right as possible, 
respecting the ordering given in so 2 . 

Example 1. Consider two spider orderings, (a, 6, c, d, e, /, g, h) and 
(w,c,x,y, /, z). Note that these have two elements c and / in common. 
We can merge in either order: (a,b,c,d,e, f,g,h) + (w,c,x,y, f,z) = 

(a,b,w,c,d,e,x,y,f,g,h,z) and (w,c,x,y, /, z) + (a,b,c,d,e,f,g,h) = 

(w,a,b, c,x,y,d,e,f,z,g,h). 

Note that we never have spider orders that have a pair of elements in common, 
but appearing in different orders, because we consider only spider chains de- 
fined by maximal chains of arrows. The following example shows the process for 
constructing a total order on the spiders in a diagram (and hence the default 
reading of the diagram). 

Example 2. The maximal length arrows chains in Fig. 15 are (/,<?), ( f,i ), ( h,g ), 
(h, i). The starting spider for the chains (/, g) and (/, 1 ) is x = S(f) and for (h, g) 
and (h,i) is y = S(h). Since there is more than one possible starting spider, 
the user needs to choose between them. Suppose x is chosen. Then (/, g) takes 
priority over (/, i) because (/, g) is a strongly bound pair and (/, i) is a weakly 
bound pair. The chain (/, g) defines the spider ordering (x, s). Next we consider 
the chain (/, i) because it also starts at x. This has spider ordering (x,s,t). 
Therefore the sequence of chains (/, g) followed by (/, i) has spider ordering 
(x,s) + ( x,s,t ) = ( x,s,t ). Since all remaining chains of arrows start at y, we 
next consider the strongly bound pair (h,i) which has the spider ordering ( y,t ). 
Merging gives the total order (x, s, t) + (y, t) = (x, s, y, t). 
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5.4 The Default Reading 

We only require a total order on the spiders in each connected component of the 
dependence graph in order to define a reading. Essentially, we try to give this 
ordering by using a prioritized list of conditions on the syntax of the diagram in 
order to decide how to read the diagram. The main idea is to follow the chains 
of arrows around the diagram and use these to give the order of the spiders. The 
conditions help decide which chains of arrows to follow. 

The following algorithm is a suggested method for obtaining a default total 
order of the spiders, and hence a default reading, for the diagram: 

1. Consider the connected components of the dependence graph. 

2. Consider the maximal length chains of arrows. 

3. If all of these chains of arrows have the same starting spider, then this is the 
start of the total order of the spiders. 

4. Otherwise we need to choose (prioritized in this order) a starting spider: 

(a) If all of the chains with a certain starting spider respect the habitats 
of its spiders, but for every other starting spider there is a chain which 
does not respect the habitats of its spiders then start at the spider whose 
chains respect the habitats. 

(b) Otherwise, ask the user: “Which of these spiders do you read first?” 

5. Given a starting spider, we need to choose a (maximal length) chain of arrows 
to read first: 

(a) Choose those chains which respect the habitats of their spiders over those 
that do not. 

(b) Suppose that two chains of arrows coincide initially, but when they first 
differ one has a strongly bound pair of arrows and the other has a weakly 
bound pair of arrows. Choose the chain with the strongly bound pair. 

(c) If there is more than one possible choice of arrow chain remaining then 
ask the user: “Which of these spiders is read next”? 

6. Continue choosing the chains of arrows with this starting spider until all of 
the associated spiders from any of these chains has been read. 

7. Merge the total orders of spiders arising from the chosen chains of arrows as 
shown in Example 2 in Sect. 5.3. 

8. Continue the process, beginning with choosing another starting spider, until 
all spiders appearing in any arrow chain have been read. 

9. Place any unused spiders (those that have no incoming or outgoing arrows) 
at the end of the total order, in any order. 

10. Use this total order of spiders to construct the semantics of the diagram. 

6 Conclusion and Future Work 

In this paper, we have demonstrated a method which only requires an ordering 
of certain sets of spiders in a diagram in order to give a unique semantic inter- 
pretation. The default reading suggested here is based on the intuitive notion of 
following arrows in the diagram in order to give a reading. The details relate to 
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the questions of where to start a reading and which chains of arrows to follow, 
in which order. 

One avenue of further work is to identify the situations in which choices 
are not required because the associated semantic statements are equivalent (for 
example, no choices are required if an independent piece of the diagram has no 
universal spiders). This will both improve the usability of the intended tool as 
well as possibly lead to an improved algorithm for choosing a reading. 

The efficacy of the default reading suggested here can be tested against ex- 
isting diagrams. In [9] a visual model of a video store is developed. The default 
reading gives the modellers’ intended meaning to all of the constraint diagrams 
appearing there. 

Usability trials of the notation, together with our suggestion for a default 
reading of a diagram, will be conducted. The results of these trials may affect the 
final version of the default reading. Furthermore, investigations of the reasoning 
process will occur, which may also lead to modifications of this reading. This 
process of deciding on a default reading and reasoning rules is necessarily an 
iterative one. 

Restricting a user to a default reading (whatever criteria are used to define 
that reading) is likely to reduce the expressiveness of a notation. However, it is 
our intention to offer a default reading so that an inexperienced user can use 
the basic syntax and the tool can return the semantics of that diagram with 
the default reading. However, we wish to allow a more experienced user to use 
the slightly more complex, but more easily expressive notation, which includes 
a reading tree. Reasoning is likely to occur using the basic syntax and the reading 
tree, and so the result of reasoning rules may return diagrams which do not use 
the default reading. 

Exploration of the consequences of the default reading, and any user choices 
made during its construction, should be made available to the user. In general, 
a tool should enable a user to investigate the semantics of the diagram by inves- 
tigating examples of models. 

Tool support for the notation is already under construction, and work on 
possible conversions between constraint diagrams and other notations, such as 
OCL, is in progress. 
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Abstract. We describe a method for drawing graph-enhanced Euler diagrams 
using a three stage method. The first stage is to lay out the underlying Euler 
diagram using a multicriteria optimizing system. The second stage is to find 
suitable locations for nodes in the zones of the Euler diagram using a force 
based method. The third stage is to minimize edge crossings and total edge 
length by swapping the location of nodes that are in the same zone with a 
multicriteria hill climbing method. We show a working version of the software 
that draws spider diagrams. Spider diagrams represent logical expressions by 
superimposing graphs upon an Euler diagram. This application requires an 
extra step in the drawing process because the embedded graphs only convey 
information about the connectedness of nodes and so a spanning tree must be 
chosen for each maximally connected component. Similar notations to Euler 
diagrams enhanced with graphs are common in many applications and our 
method is generalizable to drawing Hypergraphs represented in the subset 
standard, or to drawing Higraphs where edges are restricted to connecting with 
only atomic nodes. 



1 Introduction 

The system described here links graph drawing and Euler diagram drawing into a 
system for drawing graph-enhanced Euler diagrams. In a graph-enhanced Euler 
diagram, we have a graph, an underlying Euler diagram, and a mapping from the 
graph nodes to the zones of the Euler diagram. In any drawing, the nodes are required 
to be included in the corresponding zone. We are, in effect, embedding graphs in 
Euler diagrams. Our approach is to draw the Euler diagram first, and later add the 
graph in a way that minimizes the number of edge crossings and total edge length in 
the graph. 

There are various application areas which can be visualized by such structures and 
so benefit from the work described here such as databases [3] and file system 
organization [2]. However, we show our system being used with a form of constraint 
diagram, the spider diagram [9]. This application area is in particular need of 
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automatic layout for the diagrams because automatic reasoning algorithms produce 
abstract diagrams that have no physical layout. 



Contours : { a, b } 
Zones: {{}, {a}, { b }} 




Fig. 1. The distinction between an abstract Euler diagram and a corresponding drawn Euler 
diagram 



An Euler diagram is a collection of contours (drawn as simple closed curves), 
arranged with specific overlaps. The parts of the plane distinguished by being 
contained within some contours and excluded from other contours are called zones. 
The essential structure of an Euler diagram is encapsulated by an abstract Euler 
diagram. An abstract Euler diagram is made up of information about contours and 
zones. Contours at the abstract level are not drawn, but have distinguishing contour 
labels. Zones are not parts of the plane, but a partition of the contour set into 
containing contours and excluding contours. To clarify these concepts, Figure 1 
shows, first, an abstract Euler diagram, and, second, a drawn representation of the 
same Euler diagram. The shaded zone in the drawn diagram corresponds to the 
abstract zone {a} . 

Most graph drawing systems do not take account of regional constraints, where 
nodes must be contained within complex shapes. Simulated annealing can be an 
effective method of drawing graphs using a set of simple criteria [21]. These criteria 
are used to judge the aesthetic quality of the resulting layout. Each criterion can be 
weighted to change its importance. Such systems are applicable to embedding graphs 
in Euler diagrams when used with suitable aesthetic criteria. 

In graph-enhanced Euler diagrams, the absence of a zone from the second diagram 
in Fig. 2. would convey extra information, whereas, considered as hypergraphs, these 
two Fig.s convey the same information. 





Fig. 2. Two equivalent hypergraph drawings which are different when interpreted as Euler 
diagrams 
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Inspired by the widespread use of diagrammatic notations for modeling and 
specifying software systems, there has been much work recently about giving 
diagrammatic notations formal semantics. The analysis of a diagrammatic 
specification can be done using diagrammatic reasoning rules - rules to transform one 
diagrammatic assertion into a new diagram that represents equivalent or a weaker 
semantic statement. 

One such notation, and reasoning system, is that of constraint diagrams [16,5,19]. 
A simple subset of constraint diagrams, with a restricted notation and restricted rule 
system, is that of spider diagrams. Unitary spider diagrams are Euler diagrams with 
extra notation comprising shading in zones and a graph superimposed on the diagram. 
The components of the superimposed graph are trees (called spiders). Contours 
represent sets and zones represent subsets of those sets, built from intersection and 
exclusion. The absence of a zone from the diagram indicates that the set 
corresponding to that zone is empty. Thus the absence of a zone from the diagram 
conveys information, and the two diagrams in Figure 2 have different semantics. 

Each spider drawn on the diagram has a habitat : the collection of zones that 
contain nodes of the graph. The spiders assert semantically the existence of an 
element in the set corresponding to its habitat. Spiders place lower bounds on the 
cardinality of sets. Shading in a zone (or collection of zones) indicates that the set 
corresponding to that zone (or zones) contains only elements for the spiders that are 
in it, and no more. Shading places an upper limit on the cardinality of sets. See Figure 
3 for an example of a spider diagram. 

The semantics of spider diagrams provide a foundation upon which we build 
reasoning rules. In the case of spider diagrams, there are seven rules which transform 
a spider diagram into another. For example, one rule transforms a diagram with an 
absent zone into the equivalent diagram which contains the zone, shaded. This 
reasoning rule changes the structure of the underlying Euler diagram and necessitates 
reconstruction of a drawn diagram. A sequence of reasoning rules, applied to a 
premise diagram, gives a proof which ends with a conclusion diagram. An example of 
such a proof is shown, drawn by hand, in Fig. 4. The same proof is shown again later 
in Figures 14-16. 

The seven reasoning rules each make a small change to a diagram, and they have 
been proven to be sound [20,23]. If a rule transforms diagram d\ into diagram d 2 then 
d 2 represents a semantic consequence of d\. Other rules could be devised which are 
sound, and in any logic system, the choice of rules is to some extent arbitrary. But 
these rules form a logically complete set. 

Contours : { a, b } 

Zones: {{},{«},{*} } 

Shading: {{a}} 

Spiders: {{{},{*}}, {{a}}} 

Semantics: 

\A\ = 1 and AO B = {} and |U-A| > 1 




Fig. 3. An abstract spider diagram and a corresponding drawn spider diagram 
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The full spider diagram reasoning system allows for the manipulation and 
interpretation of compound spider diagrams: that is, expressions built up from spider 
diagrams using the propositional logic connectives “and” and “or”. This extension 
leads to many more reasoning rules, giving a sound and complete reasoning system, 
equivalent in its expressiveness to monadic first order predicate logic with equality. 
More details on the system, its rales and its expressiveness can be found in [10,18]. 

We have developed a tool [9,10] to assist users with the application of reasoning 
rales to transform diagrams. At the heart of this must be an algorithm to generate 
diagrams for presentation to the user as the outcome of a rule application. 



2 Related Work 

The task of drawing an Euler diagram - taking an abstract diagram and producing a 
corresponding drawn Euler diagram is analogous to the task of graph drawing. 
Previous research has addressed some initial issues concerning the drawing of Euler 
diagrams. The paper [7] outlined well-formedness conditions on drawn diagrams and 
presented an algorithm to identify whether an abstract diagram was drawable subject 
to those conditions. If a diagram was diagnosed as drawable, then a drawing was 
produced. Later work, [8], sought to enhance the layout of a drawn Euler diagram 
using a hill-climbing approach in combination with a range of layout metrics to assess 
the quality of a drawing. There exists an Euler diagram drawing system [22] that 
embeds some small diagrams, which can be drawn with a limited subset of shapes. 

There has been some previous work in drawing extended graph systems. Clustered 
graph visualization systems are common (e.g. [4,14]), but in such structures the 
regions only nest and cannot intersect, hence they are not as expressive as Euler 
diagrams. There are a limited number of drawing methods for more complex graph- 
like structures such as hypergraphs and higraphs. Hypergraphs are similar to standard 
graphs, but with hyperedges rather than edges. Hyperedges connect to several nodes, 
in contrast with standard edges which connect at most two nodes. Hypergraphs are 
commonly represented in two ways: by the edge standard and the subset standard 
[17]. The edge standard draws hyperedges as lines, effectively adding a dummy node 
for each hyperedge, where the lines connecting to each node meet. The subset 
standard is a representation closer to enhanced Euler diagrams, where the hyperedges 
are indicated by closed curves surrounding the grouped nodes. However, there are 
still significant differences as hypergraph closed curves that intersect have no extra 
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meaning, and current hypergraph drawing methods [1] emphasize node groupings, 
putting little emphasis on the layout of the curves. Hypergraphs with binary edges are 
represented with the edge standard and with non binary edges represented with the 
subset standard are similar to commonly applied subsets of higraphs [13,15]. 

3 Drawing Euler Diagrams Enhanced With Graphs 

In this section we describe our three stage generic method for laying out graphs in 
Euler diagrams. The software system has been implemented in Java. 

3.1 Stage 1: Euler Diagram Smoothing 

The basic process of drawing Euler diagrams in stage 1 has been detailed previously 
[8]. In outline, firstly we produce an initial diagram based on the zone specification as 
described in [7]. This results in a structurally correct, but not very well laid out 
diagram. We then apply a multi criteria optimizer, which attempts to improve a 
weighted sum of various diagram layout criteria using a hill climbing method. This 
adjusts the contours by both moving them and moving the individual points of the 
polygons that are used to represent them. It assesses the layout formed on each single 
move for the presence of the correct zones and to see if the change has improved the 
weighted sum. We use several criteria for measuring diagram features, such as 
contour smoothness, contour size, zone area and contour closeness. The criteria and 
the hill climber are described in [8]. 




Fig. 5. Nesting Euler diagrams 
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This system has since been extended to deal with nested diagrams [11]. Nested 
Euler diagrams have subdiagrams entirely enclosed in a zone of a containing diagram. 
To draw a nested diagram, assuming we have a mechanism for drawing each atomic 
(non-nested) part independently, the first step is to identify, in the abstract diagram, 
which are the atomic components and which zones of containing diagrams each 
nested part belongs to. Each atomic component can be drawn and this tree-structure 
of drawn atomic components is combined into a single diagram as follows. For each 
zone which contains sub-diagrams, we find its bounding box, split it into a jxj grid 
and consider sequences of sub-boxes, width i, within the bounding box. The sub- 
boxes occupy a fraction i / j of the bounding box, and are placed sequentially at (j - z) 2 
positions scanning the whole bounding box (starting centrally). As j gets larger, the 
subboxes shrink and eventually one will be found which fits inside the zone. This 
sub-box is partitioned into disjoint boxes, within which the nested diagrams are 
inserted. This process is illustrated in Fig. 5. 

Once the nested diagram has been built in this way, the next step is to improve its 
appearance by smoothing. As the nesting can be arbitrarily deep, the amount of 
movement of polygons and polygon corners could be too large for very small nested 
contours. Flence, the amount of movement has been scaled to be proportional to the 
size of the contour (in fact, the bounding box of the contour) against the size of the 
whole diagram. 

The result of Stage 1 is normally a well laid out Euler diagram. The graph can then 
be superimposed as described in the following sections. 

3.2 Stage 2: Finding Locations for Nodes 

A node belonging to a particular zone must be placed such that the node is contained 
within the region defined by the drawn zone. Each concrete zone is defined by a 
sequence of line segments. We do not concern ourselves with disconnected zone 
areas, as these are not present in a well-formed [7] Euler diagram, however, for 
nested diagrams, at least one zone fails to be simply-connected (i.e. it's ring-shaped, 
or worse; see Fig. 6.). Zones which are simply connected (i.e. disc-like) have one 
polygon as their boundary, but non-simply connected zones have multiple polygons 
bounding them. In topology, a set is defined to be simply-connected if any path which 
starts and finishes at the same point can be continuously deformed until the path is 
constant at a point. A zone is simply connected if it is isotopic to a disc. 




Fig. 6. Two examples of non-simp ly-connected zones (shaded), drawn by our implementation 
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A variety of possible strategies exist for the initial placement of a node inside its 
containing zone. We use a fast and simple method that is primarily concerned with 
ensuring that the node is contained inside the zone, regardless of how bad that 
placement is. Subsequent application of a force model refines the placement so that 
the node is not too close to any of the boundaries of the zone. The force model also 
ensures that all nodes sharing the same zone are reasonably spaced. 

The initial placement of a node requires a line to be drawn through the containing 
zone. For simplicity of implementation, this line is horizontal and passes through the 
bounding box of the concrete zone. The y-coordinate of the horizontal line is chosen 
randomly between the range of the bounding box in order to give a scattering effect 
when there is more than one node present in a zone. By intersecting the bounding box 
horizontally, we can be certain that there is at least one subinterval of the line that is 
contained by the area of the concrete zone. 





Fig. 7. Candidate locations for a new node in zone a excluding b,c. The horizontal line is 
placed such that it intersects the bounding box of zone a at a random height. This diagram 
shows two subintervals where it is valid to place the new node 

An ordered set is built up from the intersection points of the horizontal line and the 
line segments which make up the boundary of the zone. This set must contain at least 
two points, and any location between the 2n -\ th and 2w th intersection point must 
belong to the zone (see Fig. 7.). 

The Stage 1 method for placing nested diagrams described in Section 2.1 could 
have been used for the initial placement of nodes. Flowever this node placement 
method is faster as we are placing a point rather than a shape with a bounding area 
and we are unconcerned about a central placing of the point, anticipating the 
refinement which is described next. 

After initial placement, refinement of node locations is achieved by applying a 
force model to the set Mof nodes in the zone. We introduce a repulsive force acting 
between each pair of nodes in the zone, causing them to become evenly distributed. 
This repulsive force is inversely proportional to the separation d, and proportional to 
the number of nodes, \M\, in the zone. A constant c is used to affect the desired 
separation between pairs of nodes. This repulsive force is based on the force model 
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by Fruchterman and Reingold [12] and is commonly used in force directed graph 
layout. 



I I c 

Repulsive force between two nodes = |M| x — . 

To prevent nodes escaping from a zone or getting undesirably close to the 
boundary of a zone, we make each line segment in the boundary of the zone exert a 
repulsive force on each contained node. It is desirable to let the set of nodes spread 
about a reasonably large area of the zone, however it is still essential to keep each 
node away from the line segments that define the zone. For this reason, we depart 
from the previously used force model and make the repulsive force acting on a node 
proportional to the inverse square of the distance from the line segment. This 
encourages nodes to spread over a reasonable area with very little chance of getting 
too close to a boundary due to the prohibitively high resultant forces. 




The repulsive force is proportional to |M| 2 , as this helps to contain larger sets of 
nodes where there will be more node-node repulsions. As the zone may consist of an 
arbitrary number of line segments of arbitrary lengths, the repulsive force is also 
proportional to each length. 



Repulsive force between a line segment and a node = 



\M\ 2 x 




We have observed that better results can be obtained when there are more line 
segments bounding a zone. We use a method that breaks a zone boundary into more 
line segments without affecting the region contained; typically so there become more 
than a hundred new line segments. We use the simple method of dividing each 
existing line segment into two new line segments of equal length. The process is 
repeated until it yields enough new line segments. This reduces the chance of a node 
escaping from a comer of the zone when the force model is applied. 
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The simulation of the force model is an iterative process. For each iteration, the 
resultant force acting on each node is the sum of all repulsive forces from the line 
segments of the containing zone and the repulsive forces from all other nodes in the 
same zone. After calculating all of the resultant forces, the location of each node is 
updated by moving it a small distance in the direction of the force. The distance of the 
movement is proportional to the magnitude of the force. After a number of iterations, 
the system nears an equilibrium and the nodes occupy their new locations. 

3.3 Stage 3: Laying Out the Edges 

The previous stage calculates locations for nodes. We can think of these locations as 
being candidate locations for the set of nodes in the zone, and we are free to swap the 
location of pairs of nodes, within a zone, without changing the meaning of the 
diagram (see Fig. 9.). Swapping pairs of nodes changes the location of edges 
emanating from those nodes. We use a simple hill climbing approach on this with two 
metrics to improve the quality of the diagram. 

One desirable feature of a diagram is to have a minimal number of edge crossings. 
Our first metric returns the number of edge crossings in the current diagram, so 
values closer to zero will represent a better quality of layout in terms of edge 
crossings. To further enhance the understandability of the diagram, we introduce a 
second metric, which is based on the length of edges in the diagram. Shorter edges 
make graphs easier to navigate and identify, so the value returned by this metric will 
represent an improvement in the layout if the value is closer to zero. The value 
returned is the sum of each edge length squared. 

In our current system, we are only concerned with simple straight-line edges, 
although it is worth noting that our software can deal with non-simple edges. For 
example, some notations use curves or shapes to represent special edges and our 
system is able to detect intersections with these nonlinear edges. 





Fig. 9. A diagram with 4 edge crossings (left) and the same diagram produced using the hill 
climber, with no edge crossings (right). Notice the common locations for all nodes. The right 
hand diagram has had 3 pairs of nodes swapped, in zones b, ab and abc 
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As the value returned by the edge length metric is based on the sum of edge 
lengths squared in the diagram, we make this value dimensionless by dividing it by 
the area of the diagram. This makes the metric return the same value for a particular 
diagram, regardless of the scaling. 

The two metrics are combined as a weighted sum to work out the current quality of 
a diagram. As we have determined minimization of edge crossings to be the most 
important factor, we apply a much higher weighting to this metric. That is, we are 
unlikely to reduce the total edge length in a diagram at the expense of introducing a 
new edge crossing. 




Fig. 10. A diagram demonstrating the different types of edges that are supported by our system. 
Intersections with the more complicated types of edges can still be computed 




Crossings: 0.0000 
Length: 0.0000 
Total: 0.0000 




Crossings: 1.0000 
Length: 0.1 143 
Total: 1.1 143 




Crossings: 0.0000 
Length: 0.0940 
Total: 0.0940 




Crossings: 0.0000 
Length: 0.1515 
Total: 0.1515 



Fig. 11. Total quality metrics for some graph-enhanced diagrams 



In our implementation of the system, we use a weighting of 1 for the edge crossing 
metric. The weighting of the edge length metric is relative to this and is chosen such 
that when the returned value is multiplied by the weighting, the value is typically less 
than 1. Larger values may allow total edge length to be reduced at the expense of 
introducing new edge crossings. Our implementation uses a weighting of 1 x 1 0" 3 for 
the edge length metric weighting. Some examples of “quality” values are illustrated in 
Fig. 11. 
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The hill climber is also an iterative process and runs for either a fixed number of 
iterations, or a user may interact with the process and apply more iterations if it is 
deemed necessary. Each iteration begins with selecting a random zone that contains 
more than one node. A random pair of nodes is selected from this zone and their 
locations are swapped. This does not alter the meaning of the diagram, as they both 
lie within the same zone. If the new quality of the diagram is worse than before, the 
nodes are swapped back to their original locations; otherwise, the change is kept. 
After a number of iterations, the quality of the diagram according to the metrics 
improves. The effect of the hill climber can be seen in the last image in Fig. 12. 
Smoothed versions of these diagrams are shown in Fig. 13. 






After force directed 
placement and hill climbing 



Fig. 12. The effect of using the force directed node placement and hill climber on graphs being 
embedded into an Euler diagram that has been drawn automatically 




Initial Euler diagram 



Initial node placement 



After force directed 
placement and hill climbing 



Fig. 13. Embedding the previous graphs into the same Euler diagram laid out with the 
smoothing system. It is easier to distinguish between the curved contours and the straight edges 



4 Drawing Spider Diagrams 

In this section we describe how we apply our method to spider diagrams. The method 
is essentially that described in Section 3, where we describe a method to draw graphs 
on Euler diagrams, except that spider diagrams do not have arbitrary graphs 
connecting nodes. Instead, nodes are connected in spanning trees, and the manner in 
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which the nodes are connected in the tree is not significant. The abstract syntax of 
spider diagrams expresses spiders purely in terms of their habitat: a collection of 
zones. A spider whose habitat comprises three zones, z\, z 2 , z 3 can be drawn with a 
graph edge (the spider's leg) drawn between graph nodes (the spider's feet) in zj and 
z 2 and a second leg between graph nodes (the spider's feet) in z 2 and z 3 . An alternative 
drawing might draw legs between z 3 and z 2 and between z, and z 3 . Only once a spider 
is drawn do we know which of its feet have a leg between them. As we only have the 
information about which sets of nodes are connected, our drawing method needs an 
additional process that develops a tree between the nodes. 

Once the feet for each spider have been placed, it is possible to use Prim's or 
Kruskal's algorithm to form a minimal spanning tree. This completes the concrete 
representation of the spider with the smallest total edge length, but does not take into 
account edge crossings. As our hill climbing method gives preference to changes that 
reduce edge crossings, we do not create a minimal spanning tree, but trivially form a 
chain of spider legs that connect each spider foot. 

In [18,10], spider diagrams are given semantics, and diagrammatic reasoning mles. 
A reasoning rule transforms one diagram into another, whose semantics are a logical 
consequence of the premise diagram semantics. A proof in the spider diagram 
reasoning system is simply a sequence of these diagrammatic transformations, which 
could be elicited from a user, with a software tool assisting in the valid application of 
reasoning rules. Alternatively, proofs can be automatically generated between given 
premise and conclusion diagrams [9,10]. An example of an automatically generated 
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Fig. 14. An abstract proof 



proof is shown in Figs 14, 15 and 16, where the rules “Add Shaded Zone” and “Add 
Spider Foot” have been applied. The first rule changes the underlying Euler diagram, 
and the second rule changes the superimposed graph. Without any results on drawing 
spider diagrams, the proof can only be presented in its abstract form (Fig. 14). The 
preliminary work on drawing can present the proof with correct but unappealing 
diagrams (Fig. 15). After combining the algorithm described in this paper with the 
previous work on smoothing [8], the proof is presented in a most readable fashion 
(Fig. 16). 

The final position of spider feet in an automatic layout often depends on their 
initial placement before the force model is invoked. We sometimes observe nodes 
getting stuck in locally minimal energy states, where they are unable to move 
elsewhere in a zone. To escape from the local minima, simulated annealing could be 
employed to make nodes periodically jump a larger distance to see if it is beneficial to 
the energy level in the force model. Fig. 17 illustrates one example of a bad layout 
caused by local minima. 
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Fig. 16. A drawn proof with smoothed diagrams 




Fig. 17. Different layouts for the same graphs in an Euler diagram. The bad automatic layout 
occurs when both nodes in zone a are initially placed close to each other and reach a local 
minima while the force model is being simulated. In this case, it is not possible to reduce edge 
crossings without moving nodes 




Automatic layout (bad) 




5 Conclusions and Further Work 

We have presented a method for automatically embedding graphs in Euler diagrams. 
The Euler diagrams are laid out using a multicriteria optimizing system. Nodes are 
placed at initial locations before being refined with a force model that involves 
interactions between zone boundaries and other nodes in the same zones. Finally, 
edge crossings and total edge length in the graphs are reduced without changing the 
meaning of the diagram, using a hill climbing approach. We have also specialized our 
method to apply to the syntax of spider diagrams and we demonstrate a software tool 
that draws automatically generated proofs in a spider diagram reasoning system. 

The current implementation of the force model for placing nodes does not 
guarantee in all cases that nodes will remain inside the correct Euler zones. Allowing 
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nodes to move out of their zones is very undesirable, as this would change the 
structure and meaning of the diagram. We are confident that in all but special cases 
(those where the nodes are initially placed very close to zone borders and with other 
nodes nearby) the force model maintains the node locations, but it would be relatively 
simple to add a structure check to the diagram after each iteration of the force model. 
This would check that the nodes remain in the correct part of the diagram. If a node 
movement had changed the structure by moving outside of the correct zone then the 
node could either be placed back where it originated or placed randomly in the 
correct zone. 

At the moment, the optimization of the graph layout relies on swapping nodes that 
are in the same zone. A further addition to this for spider diagrams is to change the 
spanning tree of a spider as a move in the hill climber, in an attempt to improve edge 
crossings and edge length of the final graph. We feel that this can improve the layout 
of spiders. 

The example of a proof shown in Fig. s 14, 15 and 16 was chosen well, to ensure 
that all the intermediate diagrams are drawable (a property of Euler diagrams, as 
described in [7]). More work needs to be done to resolve, and draw, Euler diagrams 
that are currently diagnosed as “undrawable”. Resolution will require the drawing of 
diagrams which use multiple crossing points between contours, and may even allow 
different contours to share a concurrent path. There are usability drawbacks to these 
kinds of diagram syntax, but perhaps even more serious usability drawbacks if we can 
create no drawing at all for a proof step. The smoothing approach would need to be 
adapted in the case that multiple contours were allowed to pass through the same 
point, for example. Without adapting the algorithm, the concurrent contours would 
inevitably “pull apart”, creating extra zones which change the underlying diagram 
structure. This is an issue with the applicability of drawing Euler diagrams and not 
directly relevant to this paper. 

Another important question raised by the proof-presentation application is that of 
continuity of proofs. When a diagram transformation is made (a new zone is added or 
a new spider foot is joined), the transition is best understood if the concluding 
diagram closely resembles the preceding diagram, highlighting only a local change. 
We seek to maintain the mental map between the dynamic visualizations at each 
proof step. There are several possible ways to achieve this. One method is to include 
a mental map criteria across all diagrams when performing hill climbing at both the 
Euler Diagram and node location stages. Another method is to draw the first diagram 
nicely, and then attempt to draw subsequent diagrams incrementally to remain as 
close to previous ones as possible. It is also worth noting that phases 2 and 3 of the 
current system could be combined to form a meta-heuristic, which may simplify 
mental map preservation. 

This task of drawing one diagram given another diagram which is structurally 
similar, generalizes to drawing given a context which is a library of drawn examples. 
Creating drawings in such a context could allow a tool to learn about user preferences 
in diagram layout. 
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Abstract. Liar puzzles have been popularized by Raymond Smullyan 
in several books. This paper presents a logical and diagrammatic ex- 
amination of such puzzles in terms of a epistemic truth values. Also, 
non-monotonic reasoning may occur as new information is learned about 
a puzzle. This paper presents a way to think about such non-monotonic 
reasoning which does not involve the use of a non-monotonic logic but 
instead utilizes context shifts among static logics. The information com- 
ing from the presented diagrams is timeless, it is a monotonic back-bone 
of the whole non-monotonic knowledge. 

Keywords: Puzzles, diagrammatic reasoning, non-monotonic reasoning 



1 Introduction 

Raymond Smullyan has popularized certain “liar puzzles” in some of his books 
([Smu78], [Smu98]). An analysis of The Liar paradox was done in [BE87] using 
non- well founded set theory. In this paper, we explore liar type puzzles using 
a notion from Nuel Belnap in [Bel 75] and [Bel76] where a four valued seman- 
tics is used. The particular interpretation of the four values has been changed 
to fit these puzzles, and two additional values added. To solve a puzzle is to 
present a particular approximable mapping from the elements of the puzzle to 
a six valued lattice. The process of solving the puzzle is captured by increasing 
information about the evaluation function. 

An interesting application of non-monotonic reasoning occurs when one re- 
ceives more information about a puzzle. From Barwise and Seligman, [BS97], 
non-monotonic reasoning can be captured in logic not by presenting a partic- 
ular non-monotonic logic but rather by shifting to a new logic where different 
constraints are applicable. 

By a new logic, what is meant is that the language remains the same, but 
either the interpretations or the atomic facts have changed. The same six values 
are used to evaluate the new and old logics. 

* This research was conducted while the second author was at Indiana University and 
while on his own time with NRL. 
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Consider the pairs of values 

Never Telling Truth, Telling Truth, Telling Truth, 

Lying Lying Never Lying 




Unsure If Telling Truth, Telling Truth, 

Lying Unsure If Lying 




Unsure If Telling Truth, 

Unsure If Lying 

These are “told” or epistemic values. They are used when reasoning in the pres- 
ence of incomplete information, i.e., when the information is accessible but not 
yet known. They are also used for managing non-uniform information, i.e., both 
Telling the Truth and Lying. They can be arranged in the following structure 
where the pair x,y for x,y € {0, 1} refers to the truth and lying respectively: 



increasing 

information 



0,1 



-L, 




1 




1 , 1 - 



1,0 







-L.-L 



increasing truth 



The known information content increases from bottom to top and known log- 
ical truth increases from left to right. There are two partial orders here, the 
information order and the logical order. The information order does not include 
relations among 0, 1 and 1, 1 and 1, 0. The values in the diagram are used to tag 
individuals in the puzzle, they are not used to tag an individual’s statements. 
A statement made by an individual is either true or false. However, if an indi- 
vidual is known to be of type 1,0, then all his statements are also known to be 
true since s/he can never lie. 
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Let us consider the following puzzle from Smullyan [Smn78], puzzle number 
40. Knights always tell the truth, knaves always lie, and normal persons can do 
either. Two people, A and B , each of whom is either a knight, knave, or normal 
person, make the following statements: 

A: B is a knight. 

B : A is not a knight. 

Prove that at least one of them is telling the truth, but is not a knight. The 
solution in the book has two cases: 

Case 1: A is telling the truth. Then B indeed must be a knight, in which case 
what he says about A is true. Hence A is not a knight. So A is telling the 
truth but is not a knight. 

Case 2: A is lying, therefore B is not a knight. But since A is not telling the 
truth, A is not a knight. So what B is saying about A is the truth, but B is 
still not a knight. 

The diagram for this puzzle at the outset (see [Nag03] where diagrams for 
some types of truth-teller - liar puzzles were introduced), i.e. , merely record- 
ing the initial information contained in the puzzle without making any further 
inferences, is the diagram below: 



is a knight 




is not a knight 



The solid arrow means that A has said something affirmative about all 
of B ' s statements. And, in fact, that solid arrow has the “value” of A ’ s sen- 
tence about B. This value is not a truth value but instead is an affirmative 
proposition that A utters about B, i.e., “says that”. A could be either telling 
the truth or lying in this instance. 

The dotted line means that B has said something negative about A’s state- 
ments, and B could either be lying or telling the truth. Notice that B does not 
go far enough to claim that A is a knave. 

The “solution” to the puzzle is really two solutions without enough informa- 
tion to choose between them. Diagrammatically, let us use the notation A : x,y 
to indicate where in the structure of values each node falls. The first case begins 
with the diagram on the left: 



is a knight 




A:T,T B \ . L,_L 

"v ■ 

is not a knight 



is a knight 




A: 1,_L B: 1,0 

"v • 

is not a knight 
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Assume now that A is telling the truth, then BA node is valued as the diagram 
indicates on the right. However, if this is the case, then B saying that A does not 
always tell the truth must be the case and the dotted line promotes A’s lying 
value to 1 yielding the following diagram: 



is a knight 




A: 1,1 5:1,0 

"v ■ . . . ...■■■ 

is not a knight 



That is, A both tells the truth and lies. The second case starts with the diagram 
(as before) on the left (below): 



is a knight 




A:_L,T 5:_ L,_L 

"V. . . . 

is not a knight 



is a knight 




A:±, 1 5:1,1 

■V . . . . 

is not a knight 



Since he’s lying about 5’s being a knight and always telling the truth, 5 must 
in at least one case be lying, hence he gets a 1 lying value. However, because 5 
is telling the truth about A’s lying in this instance, then 5’s truth value must 
be promoted to one: 



is a knight 




A:±, 1 5:1,1 

• 

is not a knight 



Notice that in both cases, A can be lying about “I am not a knight” . If s/he does 
say this statement then there is “self” dotted line regardless of what values 5 
must have: 



is not a knight ^ s a knight ® s no } a knight ^ s a knight 



A: A, l 

A ^ 

is not a knight 



5 



A : 1,1 

A ^ 

is not a knight 



B 



However, the fact that A can lie is a true statement, hence his value can be pro- 
moted to 1, 1 regardless how 5 is valued. The common fact in the two solutions 
is that A must be lying about something and 5 must be telling the truth about 
something. 
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Since these diagrams are all parametric in A and B , there is a diagrammatic 
rule of the form: 



is a knight 




is not a knight 



is a knight 

A:±,l B : 1,_L 

'sr ■ 

is not a knight 

The moral here is that the diagrams are able to tease out common or universal 
knowledge without resorting to a case analysis. Put in other words, the case 
analysis is built into the rules since for all problems where the rule applies, the 
case analysis is the same. Incidently, this particular piece of information is not 
contained within Smullyan’s answer and it is not clear he overtly knew it was 
hidden there. 

Another rule is the following: 



is not a knight 



A 

A 



is not a knight 



A: 1,1 
A 



The reasoning is as follows: suppose A is making a true statement, then he has 
uttered a truth about the fact that he also has made a false statement. Hence A 
must be valued 1,1. On the other hand, suppose A is making a false statement 
that he is not a knight. Then he must indeed be a knight, but this is impossible 
since he’s uttered a false statement. So, A is making a true statement about the 
fact he sometimes lies. 

There are two other arrows that were not needed in the above puzzle. 



is a knave is not a knave 




The labels on the arrows can now be dispensed with as the shape of the arrow 
carries all the important information. 
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Example 1. Assume the atomic sentences “can say truth” and “can lie”, but with 
a restriction: if someone can say a truth, then s/he must say at least one true 
statement, and if s/he can lie, then there s/he utters least one false statement. 
There are 4 persons, the problem is to determine who can lie and who can say 
the truth. 

A : B is not a knave. 

B : I (myself) and C are not knights. 

C : A and D are not knaves. 

D : Everybody is not a knight. 

The graphs for each of the statements are as follows (on the left): 




The solution is the graph on the right. The self arrows around B and D instantly 
promote their values to 1,1. Since A’s has uttered a true statement about B, 
he must at least be valued at 1, JL. And similarly, C must too be valued at 1, JL 
Via our assumption A and C are knights (they have only true statements). 



2 Basic Theory 

There are certain features of the diagrammatic system which make understand- 
ing the theory of these puzzles, which will be dubbed, KNK puzzles, easier. In 
particular, one can show that unless some extra assumptions are made about 
the underlying puzzle, then any person in the puzzle valued at 1, _L or _L, 1 never 
has the _L promoted to a 0. 



2.1 Basic Definitions 

Definition 1. Atomic statements are of four types: “...can lie”, “...can 
say the truth”, “. . . cannot lie” and “. . . cannot say the truth”. 

Smullyan would call people of whom the first two atomic statements are true 
“Normal”, or at least not known to be completely truth telling or complete liars. 
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The second two atomic statements are true of Knights and Knaves, respectively, 
and they can be rephrased as . . must say the truth” and . . must lie”. 

There is an important point which bears noting. The collection of statements 
uttered by the people in the puzzle need not exhaust all of their statements. To 
consider the collection uttered to be all their statements is to impose a “closed 
world” condition. So it is possible for a person to have only uttered one true 
statement but still only have the value 1, J_ simply because s/he is not known 
to have uttered a false statement. 

Definition 2 . A KNK-puzzle is one where there are some persons and they 
say some of the aforementioned atomic statements. A Knight’s atomic statement 
must be true. A Knave’s atomic statement must be false. And between them, 
a Normal person can say true and can say false atomic statements also. Usually 
the goal is to “solve the puzzle”, which means that we want to know what is the 
type of each person. 

Definition 3. A KNK-puzzle is clear, if we have no other information to solve 
it than the given atomic sentences. 

2.2 Basic Lemma 

Lemma 1. Every clear KNK-puzzle has a trivial solution in which everybody is 
Normal. 

Proof. Consider a person in a clear KNK-puzzle. S/he can make either true or 
false statements. It is only possible to check each statement to see if it leads 
to contradiction when the statement is assumed true or false. Hence the only 
value one can determine for the person is one consistent with 1, _L or _L, 1, i.e. , 
that they told the truth or lied in this particular instance, or if s/he remained 
silent (_L, _L). By induction on the number of nodes in the puzzle, there can only 
be the values 1,_L, _L, 1, and 1,1 assigned to any node. And the first two can 
consistently be promoted to 1, 1 simply because they could say something, which 
is not recorded in the puzzle, which causes the _L to be promoted to a 1. □ 

An extra assumption is needed to avoid the trivial solution, or as in many 
of Smullyan’s books, more information is needed. Of course any new informa- 
tion must relate in some way to determining some individual’s type. This new 
information could be “knowing” the type of a particular individual, “knowing” 
the number of individuals of a particular type, etc. There are endless variations 
such as supposing that only the Knights have true statements and only Knaves 
have false ones, and a Normal must say both a true and a false statement in the 
puzzle. 

3 Diagrammatic Rules 

Given certain graph configurations, some of the values attached to the individual 
nodes can be immediately determined. Rules will be written in a short hand way 
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where premises are left unstated and the values on the nodes in the consequent 

must be monotonically greater than those of the premises, i.e., J L < 1, _L, etc. 

The values in the consequent give the least values that the nodes must take. 

Rule 1 



la 



A: 1,1 
A 



lb 



A: 1,1 
A 



The first rule was presented (and argued for) previously. The rule on right is 
argued similarly: whatever A says cannot be true, but then A has just uttered 
a truth and hence must be normal. This means that any graph having these 
particular forms of self-arrow must be evaluated as 1,1. 

There are other rules, all will be presented in this shortened form where the 
antecedent of the rule is assumed to be a diagram (possibly with some values 
filled in) but with the consequence diagram having at least newly introduced l’s 
(and possibly 0’s) as indicated. The general rule of thumb is that any ± on an 
node in the antecedent can be promoted to the value indicated in the conse- 
quence, i.e., a 1. No 0 or 1 value in the antecedent can be changed to anything 
else in the consequent. 

The next four rules handle pairs of arrows: 



Rule 2 



2 a 



A: 1, _L 

'T-. 



B: 1,jL 2b 



A : 1,1 

N:, 



5:1,1 



The argument for the first diagram is as follows: suppose A is telling the truth 
that B does not always lie, then B must at least have value 1,1. In that case, A 
has uttered a truth and hence his value is 1,1. Now suppose A is lying about B 
not always lying. So B always lies, and hence says that A must always tell the 
truth. This is a contradiction since the premise was that A lied in this case. 
Consequently, this branch of the case structure is impossible, it is only possible 
that A tells the truth. 

The argument for the second diagram is as follows: suppose that A is telling 
the truth that B does not always lie. B says that A always lies. B must be lying 
in this instance because the premise was that A is telling the truth. So A has 
value 1, T and B has value _L, 1. 

Now suppose that A is lying and hence that B always lies. B is lying about 
saying that A always lies. So A must at least have the value 1, 1. And since B 
is lying in this instance, he must have value _L, 1. In both cases, (the minimum 
common information) A has is value 1, T and B is value _L, 1. 

Rule 3 



3a A:. L,1 



5 : 1, _L 3b .4 : L 1 



B : _L,1 
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The argument for the first diagram is as follows: assume A is telling the truth, 
and so B always tells the truth. Since B says that A can sometimes lie, A is 
valued at 1, 1 and B is valued at 1,0. 

Now suppose that A is lying about B always telling the truth. B is expressing 
a truth in this instance by claiming A doesn’t always tell the truth. So A is valued 
at _L, 1 and B at 1, 1. In both cases, we have at least A : _L, 1 and B : 1, _L. 

The argument for the second diagram is as follows: assume A is telling the 
truth. Then B always tells the truth, and hence B claiming that A always lies 
is a contradiction. So this case cannot happen. 

Suppose A is lying and hence has value _L,1. Then B sometimes lies, and 
hence B has value _L, 1. 

The next four rules handle specific triples of arrows: 

Rule 4 




4 b A : 




truth. Hence C must sometimes or always lie. But A claims C never lies, so this 
case cannot happen and A must be lying and hence has value _L, 1. 



Rule 5 



>B 



■>B 



5a A: LA 



5 b A: L,1 



■ > C 



>c 



For both diagrams, assume A is telling the truth that B always lies, then in 
both cases, B is lying about C ' s ability to tell the truth, i.e. , it is indeed possible 
for C to tell the truth. Hence H’s statement that C always lies is false, so A is 
lying about this. Hence H’s value is _L, 1. Now assume for both diagrams that A 
is lying. Then A must have at least the value _L, 1, and hence A has _L, 1 in both 
cases. 

The last rule is really a collection of four rules where the right hand and 
bottom arrows must be of the same type and this type can be of each of the four 
possible types: 
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Rule 6 



• : : > B 




Assume A is telling the truth about B ' s always lying. Then if B says the same 
thing about C as A says, then they must both be lying about C and hence A 
must have at least value _L, 1. On the other hand, suppose A is lying about B, 
then A must at least have value _L, 1. 



4 Non-monotonic Reasoning 

There is an approach to non-monotonic reasoning hinted at in [BS97]. It involves 
switching logics rather than using non-monotonic operators within a logic. There 
is a certain amount of setup necessary to explain the notion and how it can be 
used in the KNK-puzzles. 

Some remarks about what constitutes “a logic” need to be made here. Con- 
sider a logic to be a collection of sentences and a collection of interpretations, 
both always paired together. What constitutes a rule is anything which is logi- 
cally valid, i.e. , never leads from true premises to false conclusions. If one changes 
the collection of interpretations, then the valid rules change. Clearly, these are 
not proof theoretic rules but merely expressions of logical consequence. Also, the 
collection of sentences can change as with the inclusion of new information. When 
either the collection of sentences or the collection of interpretations change, the 
logic is transformed into a new one. There is some relationship with the old logic 
before the change, and this relationship is the subject of the following subsection. 

Assume you are solving a puzzle based upon a conversation you are overhear- 
ing. Consequently, you do not know when the persons have finishing speaking. 
This amounts to always thinking after every statement you have heard that you 
have complete information at that point and you proceed to reason based upon 
this additional assumption. The effect of this is to treat valuations of the form A 
never lies and A never tells the truth as somehow being susceptible to change. 
This yields a partial order C such that 0 C 1. In English, this means that a 0 
(standing for “never”) can change into a 1 but a 1 can never change into a 0. 
In short, you always tell a lie or a truth in the future, but you can never revoke 
a previous statement. 

In other hand, the previous diagrammatic rules can give some l’s to the 
solution, but never add 0’s. But not all the information comes from these rules 
in puzzles, therefore it can happen that a piece of information, which was believed 
as true turns to be false, or vice versa. Such an example will be presented later 
in this section. 
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4.1 Classifications and Infomorphisms 

Classifications come up throughout logic. They are composed of two sets and 
a binary relation. The relation has elements in the first position from the first 
set and elements in the second position from the second set. A typical exam- 
ple is the collection of sentences in some language (second set), a collection of 
interpretations of those sentences (first set), and a relation telling us which in- 
terpretations make which sentences true. These kinds of classifications are used 
in the example of this section. 

Definition 4. A classification, X = ( Tok(X),Typ(X),\=x ) is composed of 
a set of tokens Tok(X), a set of types Typ(X), and the relation |=xC Tok(X) x 
Typ(X). 

An infomorphism is a morphism of classifications, ft allows the movement of 
information in such a way that the relations are respected. 

Definition 5. An infomorphism. h : X — > Y , from the classification X = 
(Tok(X),Typ(X),\=x) to another classification Y = (TokfY), Typ(Y), |=x) 
is a pair of maps h*,h* which satisfy the following condition for y £ Tok{Y) 
and S £ Typ(X): 

y bx h*(S) iff h*(y) bx 

An infomorphism is somewhat easier to see using a diagram. 

Typ(X) — ^ TypfY) 



Tok{X )-s— Tok(Y ) 

Notice this is not a commutative diagram. The condition that an infomorphism 
must satisfy must be specified externally. 

To give an idea of what the definition on infomorphisms entails, it is helpful to 
make the following definition and subsequent theorem. The proof of the theorem 
is easy. This shows that infomorphisms are monotone maps on tokens and types 
in their induced preorders. 

Definition 6. Given a classification X, the induced partial orders on 

Typ{X) and Tok(X) is defined with 

P Ax Q iff Tok(P) C Tok(Q) x Ax y iff Tok(x) C Tok{y). 

Theorem 1 . Let h : X — > Y be an infomorphism, then P Ax Q im- 
plies h*{P) Ax h*{Q), and x Ax y implies h*(x) Ax h*(y). 

Definition 7. A constraint, r hx in the classification X is two collections 
of types, r and A, such that every token which satisfies every type in r, i.e., 
x \=x S for all S £ T, must also satisfy at least one type of A. 
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This allows us to express restrictions on tokens and is the usual notion of 
logical consequence when T and A are collections of sentences. Notice that no 
additional structure is imposed on T or A. They are merely unordered collections 
of types. It is the set forming operation and the interpretation of what it means 
to satisfy T and A in the definition that gives h x logical force. Of course, more 
structure could be added to these two sets if needed. 

4.2 An Example 

The following assumptions will be made: 

1. A normal person must say at least one true and at least one false statement 
within the puzzle; 

2. Information about the puzzle is deemed complete after each person has spo- 
ken. 

The first assumption amounts to a collection constraints of either of the 
following two forms: 

U:l,lh x TS(U) U:l,l\-xFS(U), 

where U : 1, 1 refers to “U is a normal person”, TS(U) refers to “ U says at least 
one true statement” , and FS(U) refers to “U says at least one false statement” . U 
is a metalinguistic variable ranging over A, 5, C , etc. 

Here, the notation will change slightly. Previously, the person, say, A, was 
identified with the statements s/he made. Now, “A” will refer to the individual A 
and “Aj” will refer to the i-th statement made by A. 

A\\ I (myself) can lie. 

A 2 : B also can lie. 

You have enough information to make an attempt at solving the puzzle. 



A : 1,1 
A 



^ B : 1,0 



Because A has a basic “I can lie” loop, A must be valued at 1,1. The state- 
ment Ai s/he utters must be true as assuming it is false leads to a contradiction. 
Consequently, the statement A utters about B must be false and B cannot lie. 
Therefore, B must be a Knight and valued at 1,0. 

A classification containing the above information has the collection of types 

Typ(X) = {A 1; A 2 ,(A : i,j),(B : i,j),TS(A),TS(B),FS(A),FS(B)} 

for i,j ranging over {±,1,0}, and one token, x, satisfying Ai, (A : 1,1), and 
( B : 1,0). From the constraints, you can infer that TS(A) and FS(A). 

Suppose now that B speaks up: 
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B i: C must tell the truth. 

You may then conclude, “Ah, then C is also a knight”, yielding the following 
diagram: 



A: 1,1 B: 1,0 

A 



C: 1,0 

The types of X must be extended to include C : i,j for i,j £ {_L, 1, 0} and the 
interpretation x is also suitably extended to a new interpretation x' which is just 
like x except that x' \=x (7:1,0 and x' \=x B\. 

Since the collection of types and tokens has changed, a new classification, X', 
must be used to capture this information. This new classification is linked to the 
old one via an infomorphism, h : X — > X' such that h*(Typ(X)) C Typ(X') 
and h*(x') = x, i.e. , h “loses” the extra information that token x' contained 
about C : 1,0. 

There is also a constraint which is satisfied in the new classification, namely 

Ai,A 2 , Bi l~x' C : 1,0 




Now A speaks up again: 

A 3 : C must lie. 

Observing the situation it must be a false statement, but there are two possible 
cases. Since A is normal, either diagram on the left below, or A 2 turns to into 
a true statement yielding the diagram on the right. 



A: 1,1 

A 

^C: 1,0 





Clearly, there are now two new interpretations, y and z , which will be used to 
account for A 3 . In the interpretation y, which fails to satisfy A3, y extends x' . 
However, z fails to satisfy C : 1,0. So z does not extend x' . 

Since there is a new type, A 3 , if the reasoning were monotone, then 



A 1 ,A 2 ,B 1 \-C: 1,0 

thinning 

A 1 ,A 2 ,A 3 ,B 1 \-C: 1,0 
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would be a valid rule. It cannot be because it contradicts the diagram on the right 
which clearly states that C can lie. In other words, 2 provides a counter-example 
to the conclusion. 

For the interpretation containing y, there is a new classification Y which is 
derived from X' much like X' was derived from X. However, for 2 , there are two 
new classifications, Z, and Z'. Z has the types Typ(Z) = Typ(X') U {A 3 } and 
the same tokens of X'. The infomorphism connecting the two goes from X' to 
Z , i.e. , it is an injection on types from Typ{X') to Typ(Z) and it is an identity 
on tokens from Tok(Z) to Tofc(X'). The second new classification is 71 and is 
connected by an infomorphism to Z which is the identity map on Typ( Z') to 
Typ(Z). The token map from Tok(Z) to Tofc(Z') must turn the token x' into 
the token 2 . That is, it must iron out the inconsistency that 2 does not extend x' . 
The classification picture is now as follows 

X 



Y ■* X' »- Z 71 



Next, C speaks up: 

C±: B can lie, 

C 2 : but A cannot lie. 

Considering the left diagram of the puzzle above, C cannot be telling the 
truth (in both of C\ and C 2 ) because of that claim that A cannot lie. (This 
information is fixed, coming from a diagrammatic rule.) The first puzzle diagram 
above is incompatible with C s statements and hence that branch of reasoning 
which it represents is closed. 

There will need to be a new classifications, say W, which is an extension of 
71 . The final solution diagram and classification diagram are as follows: 
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As the previous example showed there is non-monotonic part of the reasoning 
about B : 

A\,A 2 , Bi b B : 1,0 



A 1 ,A 2 ,A 3 ,B 1 ,C 1 ,C 2 h B :0,1 
This inference is not based only on diagrammatic rules. 

5 Conclusion 

The diagrammatic logic presented does simplify the reasoning with these sorts of 
puzzles by showing that some of the properties of the individuals is forced or, to 
put it another way, the rules are valid. We believe that these sorts of diagram- 
matic logics, while also representable as sentential logics, have a better fit to 
conceptual geometry of the puzzles. By conceptual geometry, we mean a mental 
picture of what are the major parts of the puzzle, how are they connected, what 
is the known state of the information at any point of the reasoning, etc. It is this 
conceptual geometry which drove our development of the rules. This dual use 
of the geometry, i.e. , as a representation for puzzle solvers and a representation 
for ourselves as logic developers, is the foundational premise for diagrammatic 
reasoning. 

Non-monotonicity is always a difficult problem and we have only scratched 
the surface here. We expect to develop Barwise and Seligman’s basic insight into 
the nature of non-monotonic phenomenon in our future research. The analysis 
here represents some of the internal plumbing of reasoning systems and is not 
intended for use by puzzle solvers. It encourages us, however, that the non- 
monotonic mechanisms do have a graphical representation in terms of category 
theoretic diagrams. 
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Abstract. This paper presents a topological and game-theoretic ex- 
tension of the system of Existential Graphs (eg). Egs were Charles 
S. Peirce’s diagrammatic and iconic approach to logic. By scribing the 
graphs on assertion spaces of higher dimensions, this extension provides 
the precise iconic counterpart to the Independence-Friendly (if) restate- 
ment of first-order logic suggested by Hintikka. Consequently, the if ex- 
tension completes the project that Peirce initiated: it breaks off from the 
linear confines of language by diagrams that extend to three dimensions, 
which Peirce predicted to be necessary and sufficient for the expression of 
all assertions. Apart from improved ways of performing conceptual mod- 
elling on natural-language expressions, this extension reveals the true 
proportions of Peirce’s sign- and model-theoretic thinking in plunging 
into the notions of identity, negation, continuity and quantification. 



1 Introduction 

This paper presents a novel topological and game-theoretic extension of the 
diagrammatic logic of the American scientist and philosopher Charles S. Peirce 
(1839-1914). The extension is related to the family of Independence-Friendly (if) 
logics [9, 12]. This term refers to different ways of removing restrictions to the 
ordinary linear dependency patterns between quantifiers and logical connectives, 
and replacing them, at the first instance, with partial orders. It is argued in [9] 
that if logic is our true elementary logic. 

In this introductory section, before moving to the details of the underlying 
architecture of Peirce’s diagrammatic system and its if extension, I make a cou- 
ple of general remarks concerning the history and the overall value of his system. 
The focus is on the beta part of his Existential Graphs (egs) [3, 28, 29, 32, 34]. 

Peirce, probably the greatest ‘knowledge engineer’ of the second industrial 
revolution, developed his graphical system in three phases. The first part, known 
as the alpha part, was inspired, among others, by Alfred B. Kempe’s endeav- 
ours during 1870s to improve Euclid’s and John Venn’s graph notation and in 
trying to prove the four-colour theorem. Its theory is isomorphic to the theory of 
propositional logic. The second came in sequences of two, first as the entitative 

* Supported by the Academy of Finland and the Hungarian Academy of Sciences 
(Project No. 104262: Communications in the 21st Century: The Relevance of 
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graphs soon overridden (in 1896) by the dual existential version. Its theory is 
isomorphic to the theory of predicate logic with quantifiers and identity. 

As Peirce held anything “extralogical” to be mere shibboleth (2.532, 1893), 1 
the third phase, known as the GAMMA part, was a cluster of attempts to deal 
with a variety of notions such as modality, including temporal concepts. Peirce 
envisaged several methods to accomplish this, including the classification of pred- 
icates that would denote different “potentials” (4.524-526, 1903; MS 508: B.6), 
and dotted closed lines to represent a unary modal operation on graphs in their 
interior. Gamma was also an attempt to reason about graphs themselves as well 
as to deal with higher-order notions of collections and quantification of proper- 
ties, to capture the diagrammatic content of sentences such as “Aristotle has all 
the virtues of the philosopher”. Hence Peirce came to instigate the concept of 
abstraction, which marked an early anticipation of category theory. Regrettably, 
he failed to separate GAMMA compartments from each other (for instance, in 
defining collections Peirce used the notion of abstraction) . 

Two preliminary remarks. 

First, it is not generally known that Peirce projected also a fourth part called 
DELTA, which was by his own words needed “in order to deal with modals” (MS 
500: 2-3, 1911). I take this remark to be indicative of the fact that the modal 
compartment of GAMMA was no good in Peirce’s judgement. Whether abstraction 
or collections fare better has nevertheless not yet been settled by Peirce scholars. 
Apart from the occasional reference, no further documentation on the planned 
DELTA has survived. 

Second, to say that theories of any two systems are isomorphic is a very 
weak assertion, since it does not imply anything about the iconic aspects of 
Peirce’s system, which was of central concern for him in devising these graphs. 
Iconicity was meant to capture the fact that representation should share the key 
structural similarities with what is attempted to be represented. As Peirce’s goal 
was to capture a “moving picture of the action of the mind in thought” (MS 298: 
1, 1905; [25]), he was after similar things as cognitive scientists using logic in 
knowledge representation tasks, or cognitive linguists in regarding meaning as 
conceptualisation, but a hundred years ahead of them. 

The significance of iconicity in Peirce’s overall system of logic has been co- 
gently emphasised e.g. in [10]. The novelty of the innovation was probably one 
of the main reasons why Peirce’s diagrammatic systems lied dormant for nearly 
a century. Another reason was that he provided no recursive definition of syntax, 
as his definitions were, by contrast, explicit, the aim of which was not to start 
from primitive concepts and relations and mapping them morphically to more 
complex concepts, but to list the boundary conditions (his graph conventions) 
that these diagrams need to satisfy. 

This paper puts the existential beta graphs to the limelight. For space limi- 
tations, I will ignore anything related to GAMMA. My aim is to extend the system 

1 I will use the standard references to Peirce’s oeuvre : 1.123 is to [16] by volume and 
paragraph number, MS is to [17] by manuscript and, if applicable, page number, and 
SS is to [4] by page number. 
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of EGs to an iconic system that correlates with if first-order logic of [9, 12]. I 
will show that such an extension is indeed feasible in diagrammatic contexts, and 
that it provides us not only with more expressive resources for the diagrammatic 
representation of our ‘moving pictures’, but also has significant repercussions to 
Peirce’s overall iconic, symbol-free programme in logic. 

I will ignore the reasoning element in graphs, namely one concerning the set 
of meaning-preserving transformations from one graph into another. With only 
minor modifications, Peirce’s transformation rules for BETA was shown to be 
complete by J. Jay Zeman in 1964 [34]. 

Some have taken the question of what such inference rules are to be indicative 
of how one should “read off” these graphs [32]. In contrast to this inferential 
project, I will show that graphs should be studied by experimenting upon them 
in a model-theoretic and, in particular, game- theoretic manner. This perspective 
was so important to Peirce that it earned a special term of the Endoporeutic 
Method (4.561, 4.568, c.1905-06; MS 293: 51, 53, c.1906; MS 514: 16, 1909; MS 
650: 18, 19, 1910; MS 669: 2, 1911; [26]). It refers to the method by which 
information traverses in these topological diagrams from the outside-in manner. 

1 will revisit aspects of endoporeutics in later sections. 

2 Peirce’s Existential Graphs Explained 

2.1 Alpha and Beta Systems 

Let us review the rudiments of EGs, focussing on the system beta. The funda- 
mental notion in any graph is the Sheet of Assertion (sa) on which graphs are 
scribed. We may take the SA to be an open-topological manifold. It provides the 
universe of discourse, which is a collection of individuals. A blank SA on which 
nothing is drawn represents a zero-place constant of the truth. (Peirce’s earlier 
entitative graphs were duals to this and so a blank SA represented falsity.) 

The basic operation in both ALPHA and beta is composition (or juxtapo- 
sition) of graphs and atomic nodes termed spots. Peirce took composition to 
be analogous to Boolean conjunction. The spots (as well as their hooks in the 
BETA case) are uninterpreted (“unanalysed”, 4.441) atomic propositions termed 
rhemas , which have blank lines of expression that need to be interpreted (MS 
491: 3-4, c.1903). Taking juxtaposition to define isotopy-equivalence classes, it 
is seen that the orientation of juxtaposed graphs on an SA does not matter for 
truth or falsity and conjunction is commutative and associative. 

The other basic operation in EGs is the cut , which is a closed simple curve 
around graphs. Peirce took this operation to be analogous to Boolean negation. 
The operation is very iconic in the sense that it literally severs its interior area 
from the SA and thus asserts the denial of what the area contains. What remains 
on the sheet is the place of the area of the cut. Thus, a cut drawn on a blank SA 
represents a zero-place constant of falsity. 
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By way of a simple example, the ALPHA graph on the left in (1) is the 
diagrammatisation of the propositional formula on the right: 






(Si v S 2 ) A (S 3 V S 4 ) 

(1) 



Moving on to beta graphs, the third key iconic sign that Peirce introduced was 
the Line of Identity (li). It connects spots such that any spot has a periphery 
upon which a finite number of imaginary hooks are pictured, and to which an li 
may connect. If an li does not connect to a hook, its unconnected extremity is 
a loose end. An li that connects to only one hook and does not have a loose end 
reduces to a zero-dimensional dot. Peirce took it to be analogous to existential 
quantification. Iconically speaking, a dot on an SA singles out an individual 
subsisting in the universe of discourse. 

At most one li or a dot may occupy a hook (and hence follows Peirce’s famous 
irreducibility of triadic relations). The number of connections at the periphery 
of a spot corresponds to the arity of a predicate. Lis may be connected to each 
other. The totality of connected Lis gives rise to a ligature. Any line that crosses 
a cut is a ligature composed of two lines. Like subgraphs, Lis in ligatures are 
compositions read as conjunctions (e.g., ‘there exists h and this b is not S ’). 
Attached to a predicate, lines of the same length and with a loose end give rise 
to an isotopy-equi valence class, and thus their order is irrelevant. 

The beta graph on the left in (2) is a diagrammatisation of the first-order 
formula on the right: 




\/x ( S\x — > S 2 X) 



(2) 



The EG in (2) has two cuts, two rhemas (predicates) Si and S 2 , and one li 
abutting a cut. It asserts that, given any individual object of the universe of 
discourse of which Si is true, S 2 is true. The two nested cuts denote implication 
(the scroll), and the heavy line of identity asserts at its outermost extremity 
that there is an individual in the universe of discourse. 

Observe how the eight different EGs below diagrammatise the correlated sym- 
bolic formulas: 

2 The isotopy-equivalence is in good accordance with Peirce’s remark that, “If either 
proponent or opponent has, at any stage, several individuals to select successively, 
it is obvious that the order in which he names them will be indifferent, since he will 
decide upon them in his own mind simultaneously” (MS 430: 62, 1902). 
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Vx Sxx 



Vx (S\x A By S^y) 



Vx3 y Sxy 



Bx ~ Sx 



Two points here. The beta system was inspired by considerations in chemistry. 
Atoms of chemical elements have “valencies”, namely bondings between atoms. 
This reflects relational composition with a bound variable, isotopic molecules 
being those in which the order of bonds and the orientation of the composite 
structure in space does not change their classification. Analogously, the order in 
which Lis or dots are connected to hooks of any spot, and the orientation on the 
SA, are irrelevant in terms of the truth-value of the graph. 

The other observation is that Peirce held the three elements of rhema, propo- 
sition and argument, to be continuous with one another ( Letter to Welby , 1908, 
SS: 72; 4.438). Such continuity is illustrated in beta graphs in terms of rhemas 
(predicate terms) being continuously connected by Lis or by composition, both 
of which, topologically speaking, express connectivity between different parts of 
the surface, thus forming propositions, and also in terms of propositions being 
connected, in the equally topological sense, under continuous deformations from 
one graph to another, thus forming inferential arguments. This is yet another 
reason why Peirce preferred using topological concepts, but all the same fell short 
of possessing some of the key topological definitions. The upshot was that he 
did not have a clear, unambiguous view of the topological structure of EGs [27]. 
A reconstruction has recently been undertaken in [1]. 

2.2 On Identity 

Let us make an intermediate conclusion concerning identity in EGs. Hintikka [8] 
has argued that the verb for being only apparently posits multiple readings, none 
of which being logically separated from the other. In other words, there appears 
to be the is of identity, the is of predication (copula), the is of existence and 
the is of subsumption (class-inclusion) . 

The BETA system reveals that a single sign, namely that of the identity line, 
works for all these senses: 

— The is of identity is represented by the two extremities of the line connected 
to each other and to the spots. 



or ' 3x 



feELL) 



S or S ■ 3x Sx 



o 



a3D 



3x Sxx 



or S 3x3 y Sxy 



-f=^) 
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— The is of predication is represented by the connection of a line with a hook 
of the spot along with the interpretation of the line. Such interpretation 
was occasionally termed by Peirce the selective, which he thought will tell 
us “how to proceed in order to experience the object intended” (MS 484: 
5, 1898) in terms of the assertion containing either the particular indefinite 
some or the universal any (N.B.: Peirce never took any to have existential 
readings). 3 

— The is of existence is represented by the outermost extremity of the line 
being enclosed within an even number of cuts. 

— Only in the is of subsumption we additionally need the iconic sign of the 
scroll , which nevertheless is nothing but two nested cuts, in order to represent 
class-inclusion. In that case, too, the identity line is needed to traverse from 
the area of the outer cut into the area of the inner cut, to represent sentences 
such as ‘Every A there is, is B’ (cf. the beta graph in (2). 

In the BETA system we thus have a remarkable vindication of Hintikka’s point 
that different aspects of identity do not reflect any fundamental logical difference, 
and may be treated in EGs in a unifying and synonymous manner, by a single 
logical sign or the single verb ‘to be’. 

To be precise, Lis play yet the fifth role: they may be used to mark the 
coreference relations in anaphoric discourse. Well-known examples are provided 
by different kinds of donkey anaphora. A more complex case is the so-called 
Bach-Peter’s sentence “The boy who was fooling her kissed the girl who loved 
him”. This may be represented by the following EG: 



the boy >. 



gloved r 
A fools 



A the gbT ) 



kissed A 



(3) 



Countless other examples of intersentential discourse involving anaphoric refer- 
ences may be rigorously captured by these diagrammatic methods. Just to men- 
tion one further, hitherto unacknowledged possibility is to reconstruct GAMMA 
graphs with identity lines to capture anaphora in multi-modal contexts [13, 18]. 



3 IF-ing on Existential Graphs 

3.1 IF Logic 

Ordinary classical first-order logic forces quantifiers and connectives in a lin- 
ear order, to be read and interpreted from left-to-right, so that any informa- 
tion concerning the values of quantifiers will propagate down to subformulas. 

3 Hence, selectives refer to instantiation rather than quantification. The kind of copula 
that predication is able to express was defined by Peirce via the connection between 
the representation and the universe of discourse. 
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if (Independence-Friendly) logic rejects this [9, 12, 24, 30, 31]. For instance, in- 
stead of writing classically that Vx3 y Sxy (‘For all x, there exists a y such that 
Sxy ’), we may write Vx(3 y/x)Sxy (‘For all x, there exists a y independently 
of x such that Sxy ’). The phrase ‘independently of’ is given precise meaning 
in terms of game-theoretic semantics (gts) [2, 6, 20, 21, 22, 24, 25] which, un- 
like for formulas of classical logic correlated with games of perfect information, 
accommodates games of imperfect information. 

We get if first-order logic by adding to classical formation rules the following. 
Let Qxif,Q £ {V, 3} and <j>o^j,o £ {A,V} be first-order formulas in the scope 
of QiXi . . . Q n x n , in which A = {x± . . .x n }. If B C A, then ( Qx/B)ij) and 
4> ( o/B ) ij) are well-formed formulas of if first-order logic. 

For instance, 3x (Six (V/x) S^x) and ~Vxi . . .Vx n (3y/xi . . .x n ) Sx\ . . .x n y 
are wffs of if logic. 

The general idea has been known under Henkin quantifiers since [5]. 
Henkin quantifiers represent independence by organising quantifiers into two- 
dimensional arrays: 



Vxi . . . Vx„ 3 y 
Vz \ . . . Vz 

m 3 u 



Sx i . . . x n zi . . . z m yu . 



( 4 ) 



If logics generalise the idea to all kinds of dependencies and independencies. 
Let us consider the if first-order formula of the form 



Vxi . . . Vx„Vzi . . . \/z m (3y/zi . . . z m )(3u/xi ...x n ) Sx i . . . x„zi . . . z m yu . (5) 

In models with pairing functions, it follows from the Krynicki normal form the- 
orem [15] for Henkin quantifiers that all if formulas in prenex and negation 
normal form can be brought into the form (5) , which in turn is equivalent to the 
Henkin quantifier (4). We use this result in the next subsection. 

Likewise with classical logic, IF formulas may be brought to Skolem normal 
forms. The Skolem normal forms of (5) and (4) are both as follows: 

3/i3/ 2 Vxi . . .x„,zi . ..z m Sx i . ..x n zi . ..z m fi(xi . . .x„)/ 2 (zi . ,.z m ) . (6) 

The difference of these existential second-order formulas to the Skolem normal 
forms of classical logic is that the Skolem functions /i ,/2 in (6) have fewer 
arguments than the classical, slash-free counterparts of (5) would (namely the 
arrays z\ . . . z m and xi . . . x n are omitted, respectively, from f\ and / 2 in (6)). 
This is due to information hiding and the GTS of imperfect information for 
formulas of if logic. In such games, not all semantic information passes from 
players moving earlier to players moving later. Taking arrays of Skolem functions 
as winning strategies for one of the players of the semantic game, the existence 
of winning strategies for the verifying player (resp. the falsifying player) (or, in 
Peirce’s terminology the Graphist and the Grapheus in relation to EGs) agrees 
with the notion of the truth (falsity) of an if formula in a given model. 
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3.2 IF Extensions of Alpha and Beta 

Allow me to now proceed to the if extension of EGs. Such an extension is possible 
already for the ALPHA fragment. An example is the sentence (7), which is an 
if version of the formula in (1) in which disjunctions are now independent of 
conjunction: 

(Sr (V/A) S 2 ) A (S 3 (V/A) S 4 ) - (7) 

This is interpreted by parallel (imperfect information) moves by the Graphist 
choosing a disjunct and the Grapheus choosing a conjunct. Since the game- 
theoretic interpretation is endoporeutic, in the ALPHA graph in (1), a subgraph 
from one of the two disjoint molecular subgraphs is chosen first, before proceed- 
ing to any of the inner, contextually constrained graphs. 

One might wish to use some quirk in the syntax and stipulate that the choices 
between the molecular and atomic subgraphs are independent. However, this 
does not preserve the premise of iconicity, according to which symbols are sub- 
ordinate to graphical representation. My solution is that we move from two to 
three-dimensional spaces to capture such independence. In order to have proper 
independence between connectives, SAs need to be assembled into multiple layers 
in three-dimensional space. 

Accordingly, we diagrammatise (7) by the graph in (8): 




Since the interpretation of these graphs is endoporeutic, it does not matter which 
one of the players, the Graphist or the Grapheus, is to choose first, as they 
may move concurrently. The choice by the Graphist is (after two role-changes 
prompted by the game rules for negation associated with the two nested cuts) be- 
tween two indices i = l, r in subgraphs {G|, G\}, {Gg, Gg}, and the choice by the 
Grapheus is likewise between the indices j = l, r annotated to graphs G\ and G 2 
on spatially separated SAs. In other words, the Graphist and the Grapheus are 
not informed of each others choices. The semantic game by which this graph is 
interpreted is thus of imperfect information. 

The upshot is that the independent moves are not between subgraphs as 
such but between indices by which they are annotated. This is simply to prevent 
players using illicit information concerning the identity of atomic graphs in their 
strategic actions. 
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We may further stipulate that (8) is a new four-place connective W (</?, 

It gives rise to a partial logic and a truth-functionally complete set of connectives 
when added to the classical set of connectives {A, V,~} [30, 31]. 

Moving now to the beta part. As in extending ALPHA to capture indepen- 
dence between different connectives (different subsets of juxtaposed subgraphs), 
to capture independence between quantifiers (lis) we likewise move to three- 
dimensional spaces. Thanks to the Krynicki normal form theorem, we may con- 
fine ourselves to if beta graphs that have at most two manifolds (SAs) layered 
in a space. (In what follows we toggle between if formulas and formulas with 
Henkin quantifiers at will. A caveat is that there is a difference between these 
two symbolisations that comes to the fore in sect. 4, points (iii) and (iv), in 
which more complex diagrammatic representations are at issue.) 

One peculiarity of such if graphs (as I will call them) is that spatially sep- 
arated sheets on which cuts and Lis are scribed are nevertheless connected by 
atomic graphs, in other words by predicate terms (rhemas), because Lis on sep- 
arate sheets need to bind predicates on both sheets. Atomic graphs are hence 
scribed as cylinders with a boundary corresponding to the periphery of rhemas. 
A finite number of hooks are imagined on the two lids of these cylinders, upon 
which lis are attached. Atomic graphs become three-dimensional objects. One 
way of looking at this is that atoms make SAs contextually dependent on each 
other. 

Let the relation = s denote strong equivalence, in other words formulas or 
graphs being true in the same models M and being false in the same models M, 
and let = w denote weak equivalence, in other words formulas or graphs being true 
in the same models M . To start with a simple example, consider two formulas 
equivalent in M in the following sense: 

M |= Vx(3 y/x) Sxy = s M \= 3 y(\/x/y) Sxy = w M \= 3y\/x Sxy. (9) 
These formulas are diagrammatised by weakly equivalent IF EGs: 



Ks) 





( 10 ) 



The graph on the left represents the two strongly equivalent formulas of (9). 
Dashed circles depict the lids, the peripheries of the rhemas S upon which lis 
are attached. The heavy dotted line on the left graph represents the periphery 
of such a three-dimensional cut. As the order of the quantifiers does not matter 
for the truth-value of these formulas, the order in which the two Lis on the left 
graph in (10) are selected is immaterial. 

In a similar vein, and to consider a slightly more complex example, consider 
the well-known ‘relative- villager’ sentence used in [7] to demonstrate the exis- 
tence of branching (Henkin) quantification in natural language: “Some relative of 
each villager and some friend of each townsman hate each other” . This sentence 
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is customarily symbolised (with self-explanatory abbreviations) by the following 
IF first-order formula: 

\x3y(\/z/xy)(3u/xy) [~ (FxATzA Rxy A Fzu) V ( Hyu A Huy)] (11) 

This sentence is diagrammatically captured by the if eg beta with two spatially 
separated SAs: 



Dashed ovals depict the area within which I have ignored, for the sake of expe- 
diency, the fine-details of the structure of the matrix part of the sentence that 
is in brackets in (11). 

All these diagrams are interpreted via GTS of imperfect information. 

4 Remarks on IF Existential Graphs 

Let me delineate five key points considering the new system of existential graphs. 

(i) Having exactly three dimensional spaces is not the crucial feature of IF EGs 

— we may as well scribe unextended beta on three-dimensional Space of 
Assertion, wherefore the negation would correspond to closed spheres, and 
quantification, identity, predication and subsumption would correspond to 
Hyperplanes of Identity dissecting the spheres. Its if extension would then 
occur in four dimensions, in other words the Spaces of Assertion would 
be layered in a four-dimensional space in which predicate terms are four- 
dimensional objects connecting these layers. The process generalises to n 
dimensions having its IF extension in n + 1 dimensions. 4 

(ii) The spatial arrangement of Lis shows what the two notions of scope that [1 1] 
has distinguished amounts to. Binding scope refers to the reach of quan- 
tification in a formula in the sense of binding different tokens of variables. 
It corresponds to the prolongation of Lis from end-to-end. Topologically, it 
expresses connectivity of different subgraphs. In contrast to binding, pri- 
ority scope refers to the logical ordering between different quantifiers. It 
corresponds to the order given by the number of cuts within which the 
outermost extremity of the line or a plane occurs. The least number of cuts 
defines the logically prioritised component. 

In classical logic, and a fortiori in ordinary beta graphs, these two notions 
are entangled in themselves. In if logic and in if EGs, they are separated. 

4 From four dimensions onwards, a caveat is that the topological manifolds may be 
non-smooth, i.e. non-differentiable. This may affect permissible transformation rules. 




(12) 
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One may ask whether we have reached a comprehensive diagrammatic 
method of representing independence. For instance, how are we to rep- 
resent (13), in which all quantifiers not explicitly slashed are assumed to 
be dependent from those occurring further out? 



V x3y\/ z{Bu / x) Sxyzu . 

The following is an IF EG diagrammatisation of (13): 




(13) 



(14) 



It is seen here that functional dependency, symbolically captured by priority 
scope, refers to the iconic property of the nesting of cylindric enclosures. 
Furthermore, if an Li x, on which the Lis drawn on the separated SAs, y, z, 
both depend (in the sense of x being logically prior to both y and z), we 
need to assume that the outermost Li x actually extends to a plane that 
dissects the cuts (cuts will be represented as cylinders). Lis are connected 
to prolonged hooks on the periphery of spots. Let us call such Planes of 
Identity curtains. For instance, the Henkin quantifier sentence (15) is rep- 
resented by the if eg (16) with one curtain (the bold rectangle) spanning 
the space that has one side connected to the periphery of the spot S and 
one loose side not connected to any spot. 



~Vu 



Vx By 
\/z 



Suxyzw . 



(15) 




(16) 



The operation that the cut defines is not just the ordinary Boolean nega- 
tion. If we distinguish from each other the boundary of a cut and its area, 
as Peirce in effect did, what we get is the distinction between closed (cut 



108 



Ahti-Veikko Pietarinen 



enclosure) and open (area of the cut) sets. 5 A closed set contains its bound- 
ary, and hence the complement of the closed set is open. Likewise, an open 
set consists of the interior without the boundary. Its complement is a closed 
set. 

If we think of the structure of graphs in this basic topological way, then the 
cut defines complementation , in other words a negative operation that is the 
weak, contradictory one. The cut would thus correspond to classical negation. 
This is fine, as far as it goes. However, what if a graph that is enclosed by the 
cut is without a truth value (which may happen in case the predicates in it are 
partially interpreted [30])? In other words, what does it mean to sever an unde- 
fined atomic graph from the SA? It ought to mean nothing. Cutting an undefined 
graph does not affect the truth- value of its complement. Likewise, prefixing nega- 
tion to an undefined predicate ought not to change it to a true predicate. It does 
so only if the negation is weak, for which we do not have a corresponding iconic 
notation different from strong negation. Accordingly, the iconic notation of cut 
actually denotes strong negation, which does not tamper with undefined graphs 
but only changes the role of the player entering the enclosure. 

Peirce erroneously thought that this transfer of responsibility between the 
advocate and the opponent delivers the same concept of negation as the set- 
theoretic meaning of the cut. It does not. There are two roles that the cut may 
be taken to play. It may obstruct the movement and thus the continuity of the 
ligature that abuts it, so that one may distinguish what is the interior and what is 
the exterior of a cut. The interior is severed from the SA and is not on a par with 
it. This means that what is scribed on that area is not asserted. The operation 
produces the denial of a proposition. In this sense it corresponds to classical, 
weak negation (‘ft is not the case that p'). The other role of a cut is that when 
the player who is interpreting the graph enters the enclosure of the cut, his or her 
strategic role will change to that of what his or her opponent has. This delivers 
game-theoretic, strong, geometric negation. These two different interpretations 
of a cut give rise to two different notions of negating a statement. 6 

Therefore, cuts in if beta graphs do not denote complementation, and thus 
do not correspond to classical negation, because complementation is confined to 
those local SAs on which they are scribed. For this reason, the interpretation of 
cuts is in terms of game-theoretic role reversals. 

5 Conclusions 

With his diagrammatic logic, Peirce embarked to improve logical analysis not 
only in the sense of having n-place predicates (spots with n hooks) and not only 

5 Since Peirce repudiated set theory in its Cantorian sense and wished to replace it with 
infinitesimals and collections and so with non-standard models, this terminology is 
of course moot. But it brings out the fundamental difference that these two notions 
of sets, open and closed, may logically give rise to. 

6 Similar notion of negation in terms of switching questions and answers is in use in 
linear logic, which thus has a venerable predecessor in Peirce’s system. 
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monadic predicates (properties), but also in the sense that the blanks may be 
filled not only by proper names but also by indefinites, which he called “ono- 
mas” (MS 491, c.1903). By so doing he anticipated the discourse referent idea 
of the discourse-representation theory [14], in which the value of a pronoun or 
an indefinite is derived from the discourse context and the common ground of 
the interlocutors. The idea is also related to Peirce’s semeiotic concept of the 
dynamic object being constructed and nurtured in the interaction between the 
utterer and the interpreter of the given assertion. Indeed, assertions scribed on 
SAs are assertions within some context from which their components derive val- 
ues. A similar idea is found in GTS in which the needed values may be picked 
among the derivational histories of the game tree produced by the endoporeutic 
interpretation of a graph or a logical formula, and possibly supplemented by 
some environmental (deictic) parameters initially assigned to these histories. 

Above all, the graphical representation and dialogical interpretation of as- 
sertions were the two key elements in Peirce’s diagrammatic method to unearth 
the dynamic content of thought and cognition. By walking further along the 
path of diagrammatisation than Peirce himself did, we put his anticipations into 
a sharper perspective: “A picture is [a] visual representation of the relations 
between the parts of its objects; a vivid and highly informative representation, 
rewarding somewhat close examination. Yet ... it cannot directly exhibit all the 
dimensions of its object, be this physical or psychic. It shows this object only 
under a certain light, and from a single point of view” (MS 300: 22-23, 1905). 

What Peirce nonetheless initiated was a notable break-off from the linear 
confines of language. He added polyphony to the tenor of language, so to speak. 
Logical methods need not protract in time, as a series of symbolic expressions and 
their manipulation, but allow representations as “diagrams upon a surface” (MS 
654: 6, 1910) with continuous deformations performed upon them. He continued, 
remarkably: “Three dimensions are necessary and sufficient for the expression of 
all assertions; so that, if man’s reason was originally limited to the line of speech 
(which I do not affirm), it has now outgrown the limitation” (MS 654: 6-7). 

This prediction holds good for our tridimensional extension of diagrams with 
imperfect information. In capturing independence and richer notions of context 
by scribing EGs in assertion spaces of higher dimensions we have improved on 
both Peirce’s own diagrammatic and the old Frege-Russell concept of logic. In 
this sense, if EGs, rather than the original two-dimensional ones that Peirce 
inaugurated, may turn out to be our true elementary diagrammatic systems. 

Analogous extensions may be implemented for other diagrammatic represen- 
tations, including those that improve on and reconstruct Peirce’s own GAMMA 
compartments (for related modal extensions, see [19, 23]), as well as the more 
recent heir to EGs, the systems of conceptual graphs [33] and their offspring. 
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Abstract. Spider diagrams are a visual notation for expressing logical 
statements. In this paper we identify a well known fragment of first order 
predicate logic, that we call £SV, equivalent in expressive power to the 
spider diagram language. The language £ST> is monadic and includes 
equality but has no constants or function symbols. To show this equiva- 
lence, in one direction, for each diagram we construct a sentence in £SV 
that expresses the same information. For the more challenging converse 
we show there exists a finite set of models for a sentence S that can be 
used to classify all the models for S. Using these classifying models we 
show that there is a diagram expressing the same information as S. 



1 Introduction 

Euler diagrams [2] exploit topological properties of enclosure, exclusion and in- 
tersection to represent subset, disjoint sets and set intersection respectively. Di- 
agram di in figure 1 is an Euler diagram and expresses that nothing is both 
a car and a van. Venn diagrams [13] are similar to Euler diagrams. In Venn 
diagrams, all possible intersections between contours must occur and shading is 
used to represent the empty set. Diagram e?2 in figure 1 is a Venn diagram and 
also expresses that no element is both a car and a van. 

Many visual languages have emerged that extend Euler and Venn diagrams. 
One such language is Venn-II introduced by Shin [9]. Diagram d 3 in figure 1 
is a Venn-II diagram. In addition to what is expressed by the underlying Venn 
diagram, it also expresses, using an x-sequence, the set CarsUVans is not empty. 
Venn-II diagrams can express whether a set is empty or not empty. Shin [9] shows 
that Venn-II is equivalent in expressive power to a first order language that she 
calls Co- The language Co is a pure monadic language (i.e. all the predicate 
symbols are ‘one place’) that does not include constants or function symbols. 

Another visual language, called Euler/Venn, based on Euler diagrams is dis- 
cussed in [12]. 

These diagrams are similar to Venn-II diagrams but, instead of x-sequences, 
constant sequences are used. Diagram d 4 in figure 2 is an Euler/Venn diagram 
and expresses that no element is both a car and a van and that there is something 
called ‘ford’ that is either a car or a van. In [12] Swoboda and Allwein give 
an algorithm that determines whether a given monadic first order formula is 
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Fig. 1. An Euler diagram, Venn diagram and a Venn-II diagram 
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Fig. 2. An Euler/ Venn diagram and two spider diagrams 



observable from a given diagram. If the formula is observable from the diagram 
then it may contain weaker information than the diagram (i.e. the formula is 
a consequence of the information contained in the diagram) . 

Like Euler/ Venn diagrams, spider diagrams are based on Euler diagrams. 
Rather than allowing the use of constant sequences 1 as in Euler/ Venn diagrams, 
spiders denote the existence of elements. The spider diagram d$ in figure 2 
expresses that no element is both a car and a van and there are at least two 
elements, one is a car and the other is a car or a van. The spider diagram dg 
expresses that there are exactly three vans that are not cars. By allowing lower 
and upper bounds (by the use of shading and spiders) to be placed on the 
cardinality of sets, spider diagrams increase expressiveness over Venn-II. 

We show, but do not include any proofs, that the spider diagram language is 
equivalent in expressive power to a fragment of first order logic that we call £ST> 
(for the Expressiveness of Spider Diagrams). The language £SV extends Cq by 
adding equality, so £SV is monadic predicate logic with equality. 

In section 5, we address the task of mapping each diagram to a sentence 
expressing the same information, showing that spider diagrams are at most as 
expressive as SSV. In section 6 we show that £SV is at most as expressive as 
spider diagrams. We will outline Shin’s algorithmic approach to show (in 
which there is no equality) is not more expressive than Venn-II. It is simple to 
adapt this algorithm to find a spider diagram that expresses the same informa- 



1 In some spider diagram languages, given spiders [5] represent constants but for 
our purposes spiders represent existential quantification. 
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tion as a sentence in SSV that does not involve equality. However, for sentences 
in £ST> that do involve equality, the algorithm does not readily generalize. 

Thus, the task of showing that there exists a diagram expressing the same 
information as a sentence involving equality is challenging and we take a different 
approach. To motivate our approach we consider relationships between models 
for diagrams. We consider the models for a sentence and show that there is 
a finite set of models that can be used to classify all the models for the sentence. 
These classifying models can then be used to construct a diagram that expresses 
the same information as the sentence. 

2 Spider Diagrams 

In diagrammatic systems, there are two levels of syntax: concrete (or token) 
syntax and abstract (or type) syntax [4] . Concrete syntax captures the physical 
representation of a diagram. Abstract syntax ‘forgets’ semantically unimportant 
spatial relations between syntactic elements in a concrete diagram. We include 
the concrete syntax to aid intuition but we work at the abstract level. 

2.1 Informal Concrete Syntax 

A contour is a simple closed plane curve. Each contour is labelled. A bound- 
ary rectangle properly contains all contours. The boundary rectangle is not 
a contour and is not labelled. A basic region is the bounded area of the plane 
enclosed by a contour or the boundary rectangle. A region is defined recursively 
as follows: any basic region is a region; if n and f 2 are regions then the union, in- 
tersection and difference of ri and r 2 are regions provided these are non-empty. A 
zone is a region having no other region contained within it. A region is shaded 
if each of its component zones is shaded. A spider is a tree with nodes (called 
feet) placed in different zones. The connecting edges (called legs) are straight 
lines. A spider touches a zone if one of its feet appears in that region. A spider 
is said to inhabit the region which is the union of the zones it touches. This 
union is called the habitat of the spider. 

A concrete unitary (spider) diagram is a single boundary rectangle to- 
gether with a finite collection of contours, shading and spiders. No two contours 
in the same unitary diagram can have the same label. 

Example 1. Spider diagram de in figure 2 has two contours and four zones. The 
shaded zone contains three spiders, each with one foot. 

2.2 Formal Abstract Syntax 

We can think of the contour labels used in our diagrams as being chosen from 
a countably infinite set, C. 

Definition 1 . An abstract unitary (spider) diagram, d, (with labels in C) 
is a tuple { L, Z, Z* , SI ) whose components are defined as follows. 
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1. L = L(d) C C is a finite set of contour labels. 

2. Z = Z(d) C {(a, L — a) : a C L] is a set of zones such that 

(i) for each label l £ L there is a zone (a, L — a) £ Z(d) such that l £ a and 

(ii) the zone (0,B) is in Z(d). 

3. Z* = Z*(d) C Z is a set of shaded zones. 

4- SI = SI(d) C Z + x (P Z — {0}) is a finite set of spider identifiers such 

that 

V(ni, n), (n 2 , r 2 ) £ SI • n = r 2 => ni = n 2 
If (n,r) £ SI we say there are n spiders with habitat r. 

When we reason with a spider diagram, the contour set may change, which is 
why we define an abstract zone to be a pair. Zone (a, b) is included in a but 
not included in b. Every contour in a concrete diagram contains at least one 
zone, captured by condition 2 (i). In any concrete diagram, the zone inside 
the boundary rectangle but outside all the contours is present, captured by 
condition 2 (ii). In order to give a unique abstraction from a concrete diagram 
we use spider identifiers (essentially a bag of spiders) rather than an arbitrary 
set of spiders. 

Example 2. Diagram d\ in figure 3 has abstract description 

1. contour labels {A, B}, 

2. zones {(0, {A, B}), ({A}, {£?}), ({£?}, {A})({A, B}, 0)}, 

3. shaded zones {({B}, {A})} and 

4. spider identifiers {(1, {({£}, {A})}), (1, {({A}, {B}), ({B}, {A})})}. 

We define, for unitary diagram d , the Venn zone set to be VZ(d) = {(a, b) : 
a C L(d) A b = L(d) — a}. If Z(d) = VZ(d) then d is said to be in Venn form. 
If z £ MZ(d) = VZ(d) — Z(d) then z is missing from d. Spiders represent the 
existence of elements and regions (an abstract region is a set of zones) represent 
sets - thus we need to know how many elements we have represented in each 
region. The number of spiders inhabiting region r\ in d is denoted by S(ri,d). 
The number of spiders touching ri in d is denoted by T(ri, d), for more details 
see [6]. In d\, figure 3, ({B}, {A}) is inhabited by one spider and touched by two 
spiders. 

Unitary diagrams form the building blocks of compound diagrams. If D\ 
and B 2 are spider diagrams then so are (Di U Df) {“D\ or D 2 ”) and {D\ fl D 2 ) 
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(“D i and Df'). Some diagrams are not satisfiable and we introduce the symbol 
_L, defined to be a unitary diagram interpreted as false. Our convention will be 
to denote unitary diagrams by d and arbitrary diagrams by D. 

2.3 Semantics 

Regions in spider diagrams represent sets. We can express lower and, in the 
case of shaded regions, upper bounds on the cardinalities of the sets we are 
representing as follows. If region r is inhabited by n spiders in diagram d then d 
expresses that the set represented by r contains at least n elements. If r is shaded 
and touched by m spiders in d then d expresses that the set represented by r 
contains at most m elements. Thus, if d has a shaded, untouched region, r, 
then d expresses that r represents the empty set. Missing zones also represent 
the empty set. To formalize the semantics we shall map contour labels, zones 
and regions to subsets of some universal set. We assume that no contour label 
is a zone or region and that no zone is a region (regions are sets of zones). We 
define Z and TZ to be the sets of all abstract zones and regions respectively. 

Definition 2. An interpretation of contour labels, zones and regions, or 
simply an interpretation, is a pair (U, F) where U is a set and F : CUZUTZ — > 
P U is a function mapping contour labels, zones and regions to subsets of U such 
that the images of the zones and regions are completely determined by the images 
of the contour labels as follows: 

1. for each zone ( a,b ), \ I(a,b) = f) F(l) (~l f) 'I'll) where I'll) = U — I'll) and 

l£a leb 

we define f) F(l) = U = f) 1/(1) and 
ie<t zg0 

2. for each region r, F(r) = (J F(z) and we define F(®) = (J F(z) = 0. 

z£r zG0 

We introduce a semantics predicate which identifies whether a diagram expresses 
a true statement, with respect to an interpretation. 

Definition 3. Let D be a diagram and let m = ( U,F ) be an interpretation. If 
D =J_ then the semantics predicate, Pjj(m) is _L. If D (^_L) is a unitary 
diagram then the semantics predicate, Pjo(m), of D is the conjunction of the 
following three conditions. 

(i) Distinct Spiders Condition. For each region r inPZ(d)-{0}, |t?(r)| > 
S(r, d). 

(ii) Shading Condition. For each shaded region r in P Z*(d) — {0}, |<?(r)| < 
T(r,d) 

(Hi) Missing Zones Condition Any zone, z, in MZ(d) satisfies l'(z) = 0. 

If D = Di U £>2 then the semantics predicate, Pa(m), of D is Pu(m) = 
P Dl (rn) V Pu 2 (m). If D = D\ n then the semantics predicate, Pu(m), 
of D is Pn(m) = PD 1 (m) A Po 2 (m). We say m satisfies D, denoted m \= D, 
if and only if Pu(m) is true. If m \= D we say m is a model for D. 

Example 3. Interpretation m = ({1,2}, if') partially defined by I / (A) = {1} and 
I'(B) = {2} is a model for d\ in figure 3 but not for c^. 
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3 The Language £ST> 

Spider diagrams can express statements of the form ‘there are at least n elements 
in A and ‘there are at most m elements in A\ A first order language equivalent 
in expressive power to the spider diagram language will involve equality, to allow 
us to express the distinctness of elements, and monadic predicates, to allow us to 
express x £ A. In order to define such a language we require a countably infinite 
set of monadic predicate symbols, V , from which all monadic predicate symbols 
will be drawn. 

Definition 4. The first order language £SD ( for Expressiveness of Spider Di- 
agrams) consists of the following. 

1. Variables, x\,x 2 , ... of which there are countably many. 

2. Atomic formulae, 

(a) if Xi and Xj are variables then ( Xi = Xj) is an atomic formula, 

(b) if Pi £ V and Xj is a variable then Pi(xj) is an atomic formula. 

3. Formulae, which are defined inductively. 

(a) Atomic formulae are formulae. 

(b) _L and T are formulae. 

(c) If p and q are formulae so are (p A q), (pV q) and ->p. 

(d) If p is a formula and x is a variable then ( \/xp ) and (3 xp) are formulae. 

We define VAR., T and S to be the sets of variables, formulae and sentences 
(formulae with no free variables) of the language £SD respectively. 

We shall assume the standard first order predicate logic semantic interpretation 
of formulae in this language, with the exception of allowing a structure to have 
an empty domain. 

4 Structures and Interpretations 

We wish to identify when a diagram and a sentence express the same information. 
To aid us formalize this notion, we map interpretations to structures in such 
a way that information is preserved. Throughout we shall assume, without loss 
of generality, that C = {Li, L 2 , ...} and V = {P\,P 2 , ...}. Define U to be the 
class of all sets. The sets in U form the domains of structures in the language 
£SV. 

Definition 5. Define TAfF to be the class of all interpretations for spider dia- 
grams, that is 

TAfP = {(U,F) :U £U AT: CUZUIZ^PU}. 

Define also ST1Z to be the class of structures for the language £SD, that is 

STU= {to: U £U Am = (£/, = m , P-^, P 2 m , ...)}, 

where P™ is the interpretation of Pi in the structure m ( that is, P™ C U ) and 
we always interpret = as the diagonal subset ofUxU, denoted diag(U x U). 
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Fig. 4. Two a-diagrams: from diagrams to sentences 



We define a bijection, h: TNT — > ST1Z by 

h(U,V) = (U,diag(U x U),V(L X ), W(L 2 ), ...). 



Definition 6. Let D be a diagram and S be a sentence. We say D and S are 
expressively equivalent if and only if h provides a bijective correspondence 
between their models, that is 

{, h{I ) : I e TNT A I |= D} = {m G STK :m\= S}. 

5 Mapping from Diagrams to Sentences 

To show that the spider diagram language is not more expressive than £ST > , 
we will map diagrams to expressively equivalent sentences. An a-diagram is 
a spider diagram in which all spiders inhabit exactly one zone [8] . Such diagrams 
have nice properties, for example, unitary a-diagrams only contain conjunctive 
information (spider legs represent disjunctive information). 

Theorem 1. Let D x be a spider diagram. There exists a spider diagram, Di, 
that is a disjunction of unitary a-diagrams and semantically equivalent to D\ 
(i.e. D\ and D 2 have the same models). 

We will map each unitary a-diagram to an expressively equivalent sentence in 
£ST>. This enables us to map each disjunction of unitary a-diagrams to an 
expressively equivalent sentence and, by theorem 1, this is sufficient to show that 
the spider diagram language is not more expressive than the language £SV. 

Example f. In diagram d\, figure 4, there are three spiders, one outside both L\ 
and L 2 , the other two inside L 2 and outside Li. Diagram d\ is expressively 
equivalent to the sentence 

(3&i -, T > i(a;i)A-'P2(a;i))A(3xi3x2 P 2 (xi)AP 2 (x 2 )/\^Pi{xi)A^Pi(x 2 )/\Xi ^ x<f). 

In diagram c^, no elements can be in L 3 and not in L\, so is expressively 
equivalent to sentence 
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Vxi ->(P 3 (x i) A -iPi(xi)). 

The disjunction of these sentences is expressively equivalent to d\ U d 2 . For 
general di and d 2 , the disjunction of their expressively equivalent sentences is 
expressively equivalent to d\ U d 2 . 

To construct sentences for diagrams, it is useful to map zones to formulae. 

Definition 7. Define function ZOT : ZxVAIZ-^ T (ZOT for ‘zone formula’) 
by, for each ( a,b ) £ Z — {(0,0)} and variable Xj, 

ZOT (fa, b), Xj) = /\ P k (xj) A f\ ^Pk(xj) 

Lk€ a Lk^b 

and ZOP((0,0),Xj) = T. 

We use the function ZOT to construct a sentence of ESV for each zone in 
a unitary a-diagram. We shall take these zone sentences in conjunction to give 
a sentence for the diagram. We define Vq to be the class of all unitary a-diagrams 
and D a to be the class of all disjunctions of unitary a-diagrams. 

Definition 8. The partial function ZS : Z x Vf) — > S (ZS for ‘zone sentence’) 
is specified for unitary a-diagram d and zone z in VZ(d) as follows. 

1. If z is not shaded and not inhabited by any spiders then ZS(z,d ) = T. 

2. If z is not shaded and inhabited by n > 0 spiders then 

ZS(z,d) = 3xi...3x„( f\ ->(xfc = Xj) A f\ ZOT(z,x k )). 

l<j<k<n 1 <k<n 

3. If z is shaded or missing and not inhabited by any spiders then 

ZS(z , d) = Vxi ~^ZOT(z, xi). 

). If z is shaded and inhabited by n > 0 spiders then 

ZS(z,d) = 3xi...3x n ( ->(x k =Xj) A ZOT(z,x k ) A 

1 <j<fc<n l<fc<n 

^*£n+l ( \J *^n+ 1 — Xj V -^ZO J~ [z , X n -\- 1))). 

l<j<7l 

Definition 9. Define DS: D a — > S (DS for ‘diagram sentence’) as follows. 
Let D be a disjunction of unitary a-diagrams. 

1. If D =_L then VS(D) =1. 

//D (y^T) a unitary a-diagram then VS(D) = /\ (ZS(z, D)) . 

zevz(D) 

3. If D = D 1 U D 2 then VS(D) = (VS(Df) V VS(D 2 )). 

Theorem 2. Let D be a disjunction of unitary a-diagrams. Then D is expres- 
sively equivalent to VS(D). 

Hence the language of spider diagrams is at most as expressive as £SV. 
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Fig. 5. Extending models for a diagram 



6 Mapping from Sentences to Diagrams 

We now consider the more challenging task of constructing a diagram for a sen- 
tence. Since every formula is semantically equivalent to a sentence obtained by 
prefixing the formula with \/xi for each free variable Xi (i.e. constructing its uni- 
versal closure) we only need to identify a diagram expressively equivalent to each 
sentence. 

Shin’s approach for Venn-II and her language Cq ( £ST> without equality) 
is algorithmic [9], which we now outline. To find a diagram expressively equiv- 
alent to a sentence she first converts the sentence into prenex normal form, 
say QiXi...Q n x n G where G is quantifier free. If Q n is universal then G is trans- 
formed into conjunctive normal form. If Q n is existential then G is transformed 
into disjunctive normal form. Quantifier Q n is then distributed through G and 
as many formulae are removed from its scope as possible. All n quantifiers are 
distributed through in this way. A diagram can then be drawn for each of the 
simple parts of the resulting formula. To adapt this algorithm to sentences in 
£ST> that do not involve equality is straightforward. 

This algorithm does not readily generalize to arbitrary sentences in £ST> 
because = is a dyadic predicate symbol which means nesting of quantifiers cannot 
necessarily be removed. Thus we take a different approach, modelled on what 
appears in [1], pages 209-210. To establish the existence of a diagram expressively 
equivalent to a sentence we consider models for that sentence. To illustrate the 
approach we consider relationships between models for a-diagrams. We begin be 
considering a particular example. 

Example 5. The diagram in figure 5 has a minimal model (in the sense that 
the cardinality of the universal set is minimal) U = {1,2,3}, E(L i) = {1}, 
^(T 2 ) = {2, 3} and, for *^1,2, ’f'(Ti) = 0. This model can be used to generate 
all the models for the diagram. To generate further models, we can add elements 
to U and we may add these elements to images of contour labels if we so choose. 
We can also rename the elements in U . As an example, the element 4 can be 
added to U and we redefine ^{L 2 ) = {2,3,4} to give another model for d. 
No matter what changes we make to the model, we must ensure that the zone 
({Ti},{L 2 }) always represents a set containing exactly one element or we will 
create an interpretation that does not satisfy the diagram. 
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If a sentence, S, is expressively equivalent to a unitary a-diagram, d, then we 
will be able to take a minimal model for S and use this model to generate all 
other models for S in the same manner as above. Given a structure, we will 
define a predicate intersection set. This set is analogous to the image of a zone 
in an interpretation. 

Definition 10. Let m be a structure and X and Y be finite subsets of V ( the 
countably infinite set of predicate symbols). Define the predicate intersection 
set in m with respect to X and Y, denoted PI(rn, X,Y), to be 

PI(m,X,Y) = p| P™ n p Tf. 

P,€X PieY 



We define p| P ™ = p| P™ = U where U is the domain ofm. 

Pie 0 PiG0 

In the context of £SV, we will identify all the structures that can be generated 
from a given structure, m, by adding or renaming elements subject to cardinality 
restrictions. We will call this class of structures generated by m the cone of to. 
For each sentence, S, we will show that there is a finite set of models, the union 
of whose cones give rise to only and all the models for S. Central to our approach 
is the notion of similar structures with respect to S. To define similar structures 
we use the maximum number of nested quantifiers in S. 

Example 6. Let S be the sentence VxiPl(xi) A Vxi 3x 2 Xi ^ % 2 - The formula 
Vxi Pi(xi) has one nested quantifier and Vxi 3 x 2 %i ^ x 2 has two nested quan- 
tifiers. Therefore the maximum number of nested quantifiers in S is two. Now, n 
nested quantifiers introduce n names, and so it is only possible to talk about (at 
most) n distinct individuals within the body of the formula. This has the effect 
of limiting the complexity of what can be said by such a formula. In the par- 
ticular case here, this observation has the effect that if a model for S has more 
than two elements in certain predicate intersection sets then S cannot place an 
upper bound on the cardinalities of these predicate intersection sets. 

The interpretation of Pi has to have all the elements, of which there must be 
at least two. Also S constrains the predicate intersection set P/(m, 0,{Pi}) to 
have cardinality zero. As an example, we consider two models, toi and m 2 with 
domains Ui = {1,2, 3, 4} and U 2 = {1,2, 5, 6, 7} respectively that are partially 
defined by P™ 1 = {1,2, 3, 4} and P ™ 2 = {1,2, 5, 6, 7}. Now 

|P/(mi,0,{Pi})| = |0| = 0 < 2 and |P/(m 2 , 0, {Pi})| = |0| = 0 < 2. 

Also 

|P/(toi,{P 1 },0)| = |P 1 |>2 and |P/(m 2 , {Pi}, 0)| = \U 2 \ > 2, 

so S cannot place an upper bound on |P/(m, {Pi}, 0)|. We can think of toi 
and m 2 extending 7713 with domain U 3 = {1,2} where P {" 3 = {1,2} and Pj — 0, 
for all j ^ 1 . 
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cone(m,S) 




m={p v P2) 

q(S)=2 

\PI(m,{},{P,P 2 })\=2 
\PI(m, {P {P 2 })|=2 
|P/(m,{P 2 },{P 1 })|=0 
\PI(m,{P v P 2 },{})\=0 



Fig. 6. Visualizing cones 



Definition 11. Let S be a sentence and define q(S) to be the maximum number 
of nested quantifiers in S and P(S) to be the set of monadic predicate symbols 
in S . Structures m\ and m 2 are called similar with respect to S if and only 
if for each subset X of P(S), either 

1. P/(mi, X, P(S) - X) = PI (m 2 , X, P(S) - X) or 

2. \PI(m u X,P(S) - X)nPI(m 2 ,X,P(S) - X)\ > q(S) 

and for all subsets Y of P(S) such that X ^ Y, PI(m\,X,P(S) — X) 0 
PI(m 2 ,Y,P(S) — Y) = 0. Adapted from [1]. 

In the previous example, mi, m 2 and m 3 are all similar with respect to S. 

Lemma 1. Let m\ and m 2 be similar structures with respect to sentence S. 
Then mi is a model for S if and only if m 2 is a model for S, [If. 

Lemma 1 essentially tells us that any model for a sentence, S , with cardinality 
greater than 2l p (‘ s ')lq , (s) can be restricted to give another model for S with car- 
dinality at most 2l p (‘ s ')lq , (s). If the cardinality of model m for sentence S is at 
most 2 1 p I‘ s ') I ^(s) then we say m is a small model for S. Otherwise we say m is 

a large model for S. 

Definition 12. Let S be a sentence and m\ be a small model for S . The cone 
of m± given S, denoted cone(mi,S), is a class of structures such that m 2 £ 
cone(mi, S) if and only if for each subset X of P(S), there exists an injective 
map , f x : PI(mi, X, P(S) — X) — > P/(to 2 , X, P(S) — X) which is bijective when 
\PI(mi,X,P(S) -X)\ <q(s). 

The cone of m given S contains models for S that can be restricted to (models 
isomorphic to) m. We can think of elements of cone(m , S ) as enlarging m in 
certain ‘directions’ (adding elements to predicate intersection sets) and ‘fixing’ 
(keeping predicate intersection sets the same) m in others. 
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Fig. 7. A diagram expressively equivalent to VxVyx = y 



Example 7. Let S be the sentence 3x\3x2Pi{x\) V ^2(^2) and consider m = 
({1, 2, 3,4}, = m 1 {1, 2}, 0, 0, ...}. A visual analogy of cone(m,S) can be seen in 
figure 6. Structure m 1 = ({1, 2, 3, 4, 5, 6}, = mi , {1, 2, 5}, 0, 0, ...) can be obtained 
from m by enlarging P/(m, 0, {Pi, P2}) and P/(m, {Pi}, {P 2 }) by adding el- 
ements to these sets (and the domain), but keeping P/(m, {P 2 }, {Pi}) and 
P/(m, {P±, P2}, 0) fixed. Another element of cone(m, S) is the structure m2 = 
({7, 8, 9, 10}, = m2 , {7, 8}, 0, 0, ...). Here, m2 renames the elements in m. The 
structure m3 = ({1, 2, 3, 4}, = m3 , {1}, 0, 0, ...) is not in cone(m, S), since there is 
not an injective map from P/(m, {Pi}, {P2}) — 1 • PI(m 3, {Pi}, {P2}). 

Example 8. Let S be the sentence \/x\/y x = y and consider the structure m\ = 
({1}, = mi , 0, 0, 0, ...) which satisfies S. We have the following cone for mi: 

cone(mi,S) = {m 2 € ST1Z : |P/(mi,0,0)| = |{1}| = |P/(m 2 , 0, 0)|}. 

The class cone{m\,S) contains only structures that are models for S but does 
not contain them all, for example m3 = (0,0,...) satisfies S but m3 is not in 
cone(m\,S). All models for S are in the class cone(mi, S) U cone{m^, S). In 
this sense, mi and m3 classify all the models for S. We can draw a diagram 
expressively equivalent to S using information given by mi and m3. This diagram 
is a disjunction of two unitary diagrams, shown in figure 7. The unitary diagram 
arising from m\ has one spider, no contours and is entirely shaded. That arising 
from m3 has no spiders, no contours and is entirely shaded. 

We will show that, given a sentence, S , there is a finite set of small models, the 
union of whose cones give rise to only and all the models for S. We are able to 
use these models to identify a diagram expressively equivalent to S. In order to 
identify such a finite set we require the notion of partial isomorphism between 
structures. 

Definition 13. Let m\ and m 2 be structures with domains U\ and U 2 respec- 
tively. Let Q be a set of monadic predicate symbols. If there exists a bijection 
7: U\ — > U 2 such that 

VP, e QWx G Pi . x G P™ 1 7(*) G P” 12 

then mi and m 2 are isomorphic restricted to Q and 7 is a partial isomor- 
phism. 
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If mi and m^ are isomorphic restricted to P(S) then mi is a model for S if 
and only if m 2 is a model for S. Also, there are finitely many small models for 
sentence S, up to isomorphism restricted to P(S). 

Definition 14. Let S be a sentence. A set of small models, class(S), for S is 
called a classifying set of models for S if for each small model, mi, for S 
there is a unique m 2 in class(S) such that mi and m 2 are isomorphic, restricted 
to P(S). 

Theorem 3. Let class(S) be a classifying set of models for sentence S . Then 
class(S) is finite and (J cone{m, S) contains all and only models for S . 

m(zclass(S ) 

Definition 15. Let m be a small model for sentence S. The unitary a- 
diagram, d, representing m, is defined as follows 2 . 

1. The contour labels arise from the predicate symbols in P(S): 

{Li e C : 3Pi GV»Pi£ P(S)}. 

2. The diagram is in Venn form: 

Z(d) = {(a, b) : a C L(d) A b = L(d) — a}. 

3. The shaded zones in d are given as follows. Let X be a subset of P(S) such 
that | P/(m, X, P(S') — X)| < <?(S). The zone ( a,b ) in Z{d) where a = {Li : 
Pi € X} is shaded. 

4- The number of spiders in each zone is the cardinality of the set \PI{m\,X , 
P(S) — X) | where X gives rise to the containing set of contour labels for that 
zone. The set of spider identifiers is then given by: 

SI(d) = {(n,r) :3ItIC P(S) A \PI(m,X,P(S) - X)\ > OA 
n = | P/(m, X, P(S) - X)| A r = {(a, b) e Z(d) : a = {L t : P; G X}}}. 

We write HEP (mi, S) = d. Let class(S) be a set of classifying models for S. 
Define 'D(S) to be a disjunction of unitary diagrams, given by 

V(S)= LI P£P(m,S), 

m(zclass(S ) 

unless class(S) = 0, in which case T>(S) =_L. 

Example 9. Let S be the sentence 3 cciPi(:ei) V VsciPi(xi). To find a classifying 
set of models we must consider structures of all cardinalities up to 2^ Pl ^l x q(S) = 
2 1 x 1 = 2. There are six distinct structures (up to isomorphism restricted to 
P(S)) with cardinality at most 2. Four of these structures are models for S and 
are listed below. 

In fact, d is a /3-diagram [8] (every zone is shaded or inhabited by at least one spider). 



2 
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Fig. 8. Constructing diagrams from models 



1. TOi = ( 0 , 0 , ...}, 

2. m 2 = ({ l},= m2 ,{l},0,0,...}, 

3. m 3 = ({1,2}, = m3 , {1}, 0, 0, ...}, 

4. m 4 = ({1,2}, ="*,{1,2}, 0,0,...). 

Therefore, the class cone{m i, S) U cone(m 2 , S') U cone(ro 3 , /S') U cone(m 4, S) con- 
tains only and all the models for S'. We use each of these models to construct 
a diagram. Models mi, m2, m 3 and m 4 give rise to d±, d 2 , d 3 and d 4 in figure 8 
respectively. Diagram d± U d 2 U g ? 3 U d 4 is expressively equivalent to S. This is 
not the ‘natural’ diagram one would associate with S. 



Theorem 4. Let S be a sentence and class(S) be a set of classifying models 
for S. Then S is expressively equivalent to T>(S). 

Hence the language of spider diagrams and £SV are equally expressive. 

7 Conclusion 

In this paper we have identified a well known fragment of first order predicate 
logic equivalent in expressive power to the spider diagram language. To show 
that the spider diagram language is at most as expressive as £SV , we identified 
a sentence in £ST> that expressed the same information as a given diagram. To 
show that £SV is at most as expressive as the language of spider diagrams we 
considered relationships between models for sentences. We have shown that it is 
possible to classify all the models for a sentence by a finite set of models. These 
models can be used to define a spider diagram expressively equivalent to S. 

The spider diagram language extends to the far more expressive constraint 
diagram language [7]. Constraint diagrams allow relational navigation (expres- 
sions involving two place predicates). The diagram in figure 9 is a constraint 
diagram. In addition to the information provided by the underlying spider di- 
agram, it expresses that ‘for all x in B — A, the relational image of x under g 
is A and there is a y in A — B whose relational image under / is an element 
of C' . It is currently unknown what fragment of first order predicate logic can 
be expressed using constraint diagrams. Various constraint diagram languages 
exist. The simplest of these restricts the syntactic components and the semantic 
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interpretation of the diagrams [10]. In [3] the authors give a reading algorithm 
for interpreting more expressive constraint diagrams. 

Some logical assertions are more naturally expressed in one language than 
another. This may lead to the development of heterogeneous reasoning systems. 
An example of such a system based on first order predicate logic and Euler/ Venn 
diagrams can be found in [11]. We plan to develop a heterogeneous reasoning 
system incorporating constraint diagrams. The other languages included may 
be influenced by the expressiveness of the languages involved. Thus it will be 
useful to know how expressive constraint diagrams are. This work on the expres- 
siveness of spider diagrams will lay the foundations for an investigation into the 
expressiveness of constraint diagrams. 
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Abstract. This paper shows by a constructive method the existence of a 
diagrammatic representation called extended Euler diagrams for any col- 
lection of sets Xi , ..., X n , n < 9. These diagrams are adapted for repre- 
senting sets inclusions and intersections: each set Xi and each non empty 
intersection of a subcollection of Xi,...,X n is represented by a unique 
connected region of the plane. Starting with an abstract description of 
the diagram, we define the dual graph G and reason with the properties 
of this graph to build a planar representation of the Xi, ...,X n . These 
diagrams will be used to visualize the results of a complex request on 
any indexed video databases. In fact, such a representation allows the 
user to perceive simultaneously the results of his query and the relevance 
of the database according to the query. 



1 Introduction 

Nowadays, enhancing the visualization of the results of complex queries in large 
databases is becoming a challenging and useful task [3, 2]. We propose to tackle 
this problem with providing the user a semantic cartography of the set of docu- 
ments answering his complex query. Let us illustrate our approach with a com- 
plex query built by students on INA’s database of video documents. They had to 




Fig. 1. Venn diagram (left) and Extended Euler Diagram (right) built from the fields 
(A)”Paris”, (B)”works”, (C)”subway”, (D) "beltway”. Grey levels of the regions are 
linked to the number of documents in the region: white regions are empty ones 



A. Blackwell et al. (Eds.): Diagrams 2004, LNAI 2980, pp. 128—141, 2004. 
(c) Springer-Verlag Berlin Heidelberg 2004 
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work on the evolution of transportation in Paris. Figure 1 shows two diagrams 
built from the fields Paris , works , subway and beltway. The leftmost diagram 
is a Venn diagram which contains all combinations of the fields, the rightmost 
one is an Extended Euler Diagram where only non-empty regions appear. These 
two diagrams show the repartition of the documents in the database according 
to users criteria, given the contribution of each field in order to help the user 
to elaborate a new formulation of the query. Moreover, these diagrams may be 
used as an interface to compose interactively boolean expressions by selecting 
regions, helping users not familiar with brackets and operator order to build 
complex queries. 

This work has been done in the context of INA: INA is the French radio and 
television legal deposit center since the law of June 1992. INA’s archives con- 
tain more than 3 million documents representing approximately 400 000 hours 
of video and 500 000 hours of audio programs. Recently the consultation of 
the database has been opened to non-professionals of documentation such as 
researchers or producers, creating the need of more convivial interfaces. When 
a new document is inserted in the database, it is described and analyzed by 
professionals, using predefined structured classifications (thesaurus, controlled 
lists...) to allow, as much as possible, non ambiguous identifications of the docu- 
ments. As a consequence, many indices of the controlled structures are exclusive 
and, in the associated diagram representation, many regions may be empty. 
This observation enhances the relevance of Extended Euler Diagrams (EED) 
compared to Venn diagrams. Indeed, if the intersection of k fields is empty, any 
intersection of those k fields with other ones is empty too. Then EED may con- 
tain significantly fewer regions than Venn diagrams, leading to a much more 
readable cartography. However, in such an application, we need to have an Ex- 
tended Euler Diagram for any combination of fields and a method to build it 
dynamically. 

This paper is organized as follows: 

— We first describe previous works on diagrams and introduce a description of 
Extended Euler Diagrams (EED in the following), and their properties. 
Then, we give a formal definition of EED and introduce a dual representa- 
tion, in terms of graph formalization: the L_connected labelled graphs. 

— In the second section, we show that the drawability of the EED is related to 
the planarity of the L_connected labelled graph and give limitations on the 
number of sets being represented by EED. 

— The third section contains a sketch of the proof which builds, for any collec- 
tion of n < 9 sets, a planar L_connected labelled graph representing it. We 
first build a minimal subgraph on a specific subset of vertices, then we show 
that this graph is planar and that any other vertex can be inserted in this 
minimal subgraph without breaking the planarity. 
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2 EED Definition 

2.1 Previous Works 

Given X = {X\,X 2 , ...,Xk} a collection of non empty distinct sets, we want to 
build a graphic representation which shows information about the sets and their 
intersections on a plane. Let Y = {Y\,Y 2 , ... , Y 2 fe } be the collection of all inter- 
sections between the Xi and Y r = {Yi\Yi G Y and Yi ^ 0}. Euler diagrams [4] 
could be used but appear to be too restrictive for our purpose. In fact, an Euler 
diagram consists of a collection of simple closed and convex curves, called con- 
tours, which split the plane into zones. Each set X,; is associated with an unique 
contour, and is represented by the interior of this contour. However, because 
of the convexity constraint, some Y r cannot be drawn with an Euler diagram 
when the number of sets Xi is equal to 4 (for a discussion on Euler diagrams 
restrictions, the reader may consult [9, 11]). 

The concrete Euler diagrams proposed in Flower and Howse’s approach [5], 
used in another purpose [6], are very well defined but are still very restrictive. 
In fact, concrete Euler diagrams are Euler diagrams with still very strong con- 
straints. The first constraints introduced at the curve level, make hypothesis on 
the set of intersections being drawn: each segment of curve delimits the interior 
and the exterior of exactly one set, and each intersection of curves is the crossing 
of exactly two contours. The introduction of “exactly” is very useful to specify 
formally the problem and its dual formulation with graphs, but eliminates the 
numerous cases in which the set of subsets built from the intersections of the X, 
does not have such properties. 

According to our purposes, we propose an extension of Euler diagrams which 
makes drawable any collection Y r from a set X of Xi such that card(X) < 8. 
Such diagrams are characterized by the following properties: 

— An intersection point may intersect more than two contours, 

— A curve segment may be part of more than one contour, 

— Each non empty Y 1 is associated with a unique zone, 

— Each set Xi is associated with a set of zones whose union forms a connected 
planar region. This region may not be convex and may contain holes. 

2.2 Extended Euler Diagrams 

Definition 1. Let L be a finite set of labels and C a set of simple closed (Jordan) 
curves in 1R 2 . We say that C is labelled by L when each curve c of C is associated 
with a couple (A (c),sign(c)) where A(c) G L and sign(c) G {+,—}. 

To each labelled curve c of C corresponds a zone £(c) defined by: 

— if sign(c) = +, then £(c) = int(c) 

— if sign(c) = — , then £(c) = ext(c) 

Definition 2. An extended Euler diagram is a triple ( L,C,Z ) whose compo- 
nents are defined as follows: 
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Fig. 2. A: a zone with two holes sign(c2) = sign(c3) = — ; B: an extended Euler 
diagram with m(zi) = {a}, m(z2) = {a, b, c} and 771(23) = {c} 



1. L is a finite set of labels 

2. C is a set of Jordan curves labelled by L and verifying: a 

(a) VZ G L , 3c G C, A (c) = l and sign(c) = +. 

(b) if A(c) = A(c'), c d and sign(c) = sign{c') then c and c' do not 
intersect 

(c) if A(c) = A(c'), c c'aa and sign(c) = +, then sign(c') = — and 
d C int(c) 

3. Z is a set of zones corresponding to the planar partition defined by C . 

Each zone z of Z is associated with a set of labels m(z) defined by 

(a) m(z) = {l G L|Vc € C, if A(c) = l then z C C( c )} 

(b) if m(z) = m(z') and m(z) 0, then z = z' 

We note Zq the set of zones associated with an empty set of labels. 

Z$ contains at least the zone z§ = fl{ c |sign(c)=+} ea ^(c). 

The set of extended Euler diagrams is noted ££T>. 

As a mater of fact, we have introduced Jordan curves to define zones, but those 
notions are equivalent. In the following, we will use rather the zones formaliza- 
tion. 

Definition 3. Leta X = {A'i, X 2 , ..., X &} be a set of non empty distinct subsets 
of X , Y r = {hi, Y 2 , Y m }a the set of all possible non empty intersections be- 
tween the Xid(m < 2 k ). We say that the extended Euler diagram (L : C,Z) is a 
diagram representation ofa X if and only if: 

1. there is a bijection if : L — ► X ; l 1 — > x 

2. : Z\Zq — > Y r ; zh y defined by (f>{z) = y = f] ;em(z) V*(0 a bijection. 

2.3 L_Connected Labelled Graphs 

A L_connected labelled graph is a labelled graph which ensures that, for any 
label l of L, there is a path connecting all the vertices labelled by l. We give in 
this section a more formal definition of L_connected labelled graphs. 

Definition 4. A labelled graph is a triple G(L,V,E) where: 

1. L is a finite set of labels 

2. V is a set of labelled vertices, i. e. : 

(a) each vertex v is labelled with a set of labels m(v) C L 

(b) two distinct vertices v and w of V have distinct sets of labels. 
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Fig. 3. L — {a, b , c, d}, V = { abd , bd, bc.cd, ad, a c}, 

Ei = { (abd, bd) , (bd, be) , (be, cd) , (cd, ad) ,(ad, ac) , (ac, abd) } , 

E = Ei U {(abd, cd), (cd, ac)}, E c = E U {(ac, be), (be, abd), (abd, ad), (ad, bd), (bd, cd)}. 
A: G(L, V, Ei) is a_connected, b_connected but it is not c_connected and d_connected; 
B: G(L, V, E) is a L_connected labelled graph; 

C: G(L, V, E c ) is the corresponding L_complete graph; 

D: ac is L_connectable to W = {abd, bd, bc\ 



3. E is a set of edges such that: 

(a) each edge e = (v,w) of E is labelled with a set of labels m(e) = m(v) D 
m(w) 

(b) if e £ E then 771 (e) 7 ^ 0 

In the rest of the paper, L(W) will be the set of labels associated with the 
vertices of W, i.e. L(W) = U veW m(v), where W is a set of labelled vertices. 

Definition 5. Let G(L,V,E) be a labelled graph (cf. figure 3 for an example). 

— Let l be a label of L. We say that G(L, V, E) is Lconnected if and only if the 
subgraph G' of G(L, V, E) on the set V' of vertices of V having l in its set 
of labels is connected. 

— G(L, V, E) is said L_connected if and only if it is Lconnected for all l in L. 

— G(L, V , E) is said L_complete when E is defined by: 

E = {(v, w)|u £ V, w £ V and m(v) fl m(w) 7 ^ 0} 

— A vertex v of V is said L_connectable to a subset W of V if and only if 
m(v) C L(W). 

In fact, given L and V, there exists only one L_complete labelled graph 
G(L , V. , E), noted G(L, V, E c ) and any L.connected labelled graph G(L , V. , E) is 
a subgraph of G(L, V. , E c ). 

Definition 6. Leta Y r = {Yi, Y%, ..., Y m } be the subset of Y which elements 
are the non empty intersections between the Xid(m < 2 k ) We say that the 
L_connected labelled graph G(L,V, E) is a graph representation ofa X if and 
only if: 

1. there is a bijection A : L — > X = {X±, ...Xk}; 1 1 — > x 

2. defined by x(v) = y = fl; 6m ( 2 ) K l ) is a bijection. 

In the following, we note: 

— G V (L, V) the set of L_connected labelled graphs associated with a given set 
of label L and a set V of labelled vertices. 

— Gj,(L, V) the set of planar graphs belonging to G V (L, V). 
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Fig. 4 . An extended Euler diagram ( L , C, Z) and its dual. We have 777(a) = 777(2 i) = 
{a}, m(abc) = m{z2) = {a, 6 , c}, 777(c) = 771(23) = {c}, m(fec) = 777(24) = {£>, c} and 
777(b) = m(2 5 ) = {b} 

3 Drawability of EED 

We have defined extended Euler diagrams and L_connected labelled graphs. 
These two notions are related: 

— a planar L .connected graph is deduced by duality from an extended Euler 
diagram (cf. section 3.1). 

— the drawing of a planar L.connected graph leads to a class of EED. We 
describe briefly in section 3.2 a method to build one of them. 

Thus it is equivalent to show the drawability of an extended Euler diagram 
and the planarity of the L.connected labelled graph associated with it. This 
observation leads us to study the planarity of L.connected labelled graphs with 
respect to the cardinality of the set of labels L. 

3.1 From Extended Euler Diagrams 

to Planar L.Connected Labelled Graphs 

Definition 7. The mapping dual : ££T> — > Q; (27, C , Z) 1 — > G(L, V, E ) is defined 
by: 

G(L , V, E) = dual((L ' , C, Z)) if and only if 

(i) there is a one to one mapping between L' and L 
(ii) there is a bijection 5 : Z — > V; z 1 — > v such that m{z) = m(6(z)) 

(Hi) e = (v, w) £ E if and only if 8 _1 (u) and 5 _1 (w;) are adjacent along a portion 
of curve of non null length in the planar partition formed by C. 

As a consequence, if (L,C,Z) is a diagram representation of X, then 
dual((L, C, Z)) is a graph representation of X. 

3.2 From L.Connected Labelled Graphs 
to Extended Euler Diagrams 

Proposition 1. If there is a planar L_connected graph G = G(L,V, E) repre- 
senting X, then there is a class of extended Euler diagrams (L, C, Z) represent- 
ing X . These diagrams are such that G(L, V, E) = dual((L , C, Z)). 
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Fig. 5. The construction of an extended Euler diagram from a drawing of a the 
planar L_connected graph of figure 3. Each internal empty zone is drawn in grey and 
is associated with a new vertex. A: a drawing of a L_connected labelled graph without 
the dangling edge ( abd , ab) and the empty zone associated with a non triangular face. 
B: the dangling edge ( abd,ab ) is drawn. C: the graph and its associated zones. D: the 
extended Euler diagram with internal zones associated with an empty set of labels 




Let us give the outline of the building of one Extended Euler diagram from 
a planar L_connected labelled graph with the example of figure 5. Following [1], 
we generate a straight-line drawing D(G) from G. When a face is triangular, 
it is obvious to draw the contours of the zones in this face. In other cases, we 
extend G by introducing special edges and vertices to obtain a triangulated 
graph G' . These vertices correspond to zones having an empty set of labels and 
provide a better control on the drawing of the resulting diagram. The dangling 
edges deserve a slightly different process as shown in figure 5. 

G(L,V,E) d ™ w D(G) ex ^> nd D(G') 

dual J, ^ diag 

( L , C, Z) 

3.3 Planarity of L_Connected Labelled Graphs 

To study the planarity of a L_connected labelled graph, we use the graphs K n 
and K n , n , n > 2 

— K n is the complete graph defined on n vertices: in K n , every vertex is adja- 
cent to every other vertex 

— K n , n is the complete bipartite graph consisting of two disjoint vertex sets 
V = {ui, ..., v„}, W = {u>i..., w n } and the edge set E = {vi,Wj\ 1 > i,j > n} 

and Kuratowski’s characterization of planar graphs [8]: 

Theorem 1. A graph is planar if and only if it does not contain a subdivision 
of K§ or K 3 3 as a subgraph. 

We already know that, if card{V ) = 2 card ^ the diagram to draw is a Venn 
diagram which has a planar representation (cf. [10]) for any value of card(L). 
But this property does not hold in the general case. In fact we have 1 : 



1 The following proposition is another version of the planarity results for Euler’s Circles 
presented by Lemon and Pratt in [9], 
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Fig. 6. Two non planar L_connected labelled graphs: on the left a K$ and on the right 
a Ki,3 



Proposition 2. Let k be the cardinality of L. When k > 9, there exists at least 
a set of labelled vertices V for which all the graphs of Q V (L, V ) are non planar. 

Proof. Suppose L = {a, b , c, d, e , /, g , h, i} and V = { abc , def, ghi , adg , beh , cfi}. 
Then QV(L,V) contains only one L_connected labelled graph which is a K 33 
(cf. figure 6). □ 

In the next section, we will show that for card(L) < 9 and for any set of labelled 
vertices V on L, Q v (L,V) contains at least one planar graph. 

4 The Constructive Proof 

4.1 Sketch of the Constructive Proof 

In the rest of the paper, L will denote a set of labels and V a set of labelled 
vertices on L. We add constraints on V using the following results: 

Definition 8. Let V be a set of labelled vertices and v and w two vertices ofV, 

— v and w are said label-disjoint when m{v) D m{w) = 0 

— v is said label_included in w when m(v) C m(w). 

Proposition 3. Let W be a set of vertices such that W C V, every vertex ofW 
is label-included in a vertex of V and V r = V\W. Then , if Q v (L,V r ) contains 
a planar graph G(L,V r , E r ), Q^(L : V) is not empty. 

Proof. Let w £ W and v £ V be such that w is label_included in v. Then 
if we add the edge e = (v,w) to G(L,V r ,E r ), we obtain a L .connected graph 
G(L, f r U{io}, if r U{e}) which is still planar (the addition of e cannot contribute 
to add a K 3 , 3 or a K§ in (L, V r U {w}, E r U {e})). By augmenting the same way 
the graph G(L,V r , E r ) for each w £ W, we obtain at the end a L.connected 
planar graph on V. □ 

Corollary 1. Let W be the subset of V formed by vertices associated with 
only one label and V r = V\W. Then, if Q v (L,V r ) contains a planar graph 
G(L,V r , E r ), Q^,(L,V) is not empty. 



Proof. We use the fact that a vertex w of W is either label .included in a vertex v 
of V or labeLdisjoint of any vertex v of V. □ 
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Using the previous corollary, we will restrict ourselves to sets of labelled ver- 
tices V on L satisfying: 

(HI) Vv, w £ V , if v is label-included in w then v = w. 

(H2) any vertex v of V has more than one label in m(v ) . 

For a set V of labelled vertices satisfying (HI) and (H2), we proceed as follows 
to show that Q)p(L, V) is not empty when card(L) < 9: 

1. we choose a subset Vo of vertices of V among those satisfying L(Vq) = L(V) 
and build a L_connected planar graph Gq(L, Vo, E 0 ) on Vo (cf section 4.2). 

2. we build a partition of V = Vo U Vi U ... U 14, k < card(Vo) (cf. section 4.3). 

3. then, for each V), * > 1, we show how to extend Gq(L, VoU 14...Ul4_i, i4_i) 
to obtain a L -connected planar graph G(L,V 0 U 14... U 14,14). This is the 
subject of section 4.4 



4.2 Choice of Vo and Construction of Go 

Let T(V) be the set of subsets T of V such that L{T) = L(V) and %(V) be the 
subset of T(V) formed by the elements T of T’(V') having a minimum number 
of vertices. Given T and T' in 7o(V), we rename the vertices of T and T' w.r.t. 
the cardinality of their associated sets of labels, i.e. T = {vo,v±...v p } and T' = 
{ug, u(...Up} with card(m(vi)) > card(m(vj)) and card(m(v')) > card(m(Vj)) 
when i < j. 

We say that T >l T' if and only if: 

— 3k < p,\/i < k, card(m(vi)) = card(m(v ')) and card(m(vk )) > card(rn{v' k )). 

— "^ina x(U) is the set composed by the maximal elements of 7o(V) for >l- 

In the rest of the paper, Vo denotes an element of 74mx(U). Moreover, for 
any subset W of V, Lu(W) denotes the set of labels belonging to only one m( v) 
for v G W and Luw{v) = Lu{W) fl m(v). 

The definition of Tjnax(U) and the hypothesis (HI) and (H2) on V imply 
the following results: 

-1- for any vertex v of Vo, Luv 0 {v) contains at least one label, thus card(Lu(Vo)) 
> card{Vo). As any m(v) contains at least two labels, we have: card{Vo) < 
card(L) 

-2- Any subset V' of Vo inherits the minimality properties from Vo: V' is such 
that we cannot have neither 

(a) 3V” C I/ s.t. L(V') C L(V”) and card{V' ) > card(U”) 
nor 

(b) 3V" C V, s.t. L(V') C L(U”) and card(V') = card(V v ) and V' < L V” 

These two points will be used when computing a L -connected planar graph 
on V: 

Let us suppose that Vo = {v\,V 2 ■■■Vk) and card(L) = n. Then using (1), we 
have card(Lu(Vo)) > k. Moreover, because of (2), U\Vq cannot contain a vertex v 



Ensuring the Drawability of Extended Euler Diagrams for up to 8 Sets 



137 




ag 



fli 



bg 



eh 



eg W 



dh 



eg 



ah 



bh 



ch 



1 gh V 



fgh eg 

ard(V q)= 6 card(V (/ )=6> 



dg 



fh eh dh 

card(V q)=7 



Fig. 7. Examples of labelled graphs Go 



such that Lu{Vo}(vi) U LuiVo}{vj) C m{v) where Vi and Vj are two distinct 
vertices of Vo- In fact, in such a case, V' = Vo U {v}\{u;,U/} would contradict 
the hypothesis of minimality of Vo- 

Then, the presence of labels of Lu(Vo) in the set of labels of a vertex of P\Vo 
is strongly constrained, and this fact reduces drastically the number of cases to 
consider at each step of the proof. 

Proposition 4. If card(L) < 9, Vo) is not empty. 

We note Go an element of G^>(L, Vo). 

Proof. Let G(L, Vq,E 0 ) be a L_connected graph of G v (L,Vq) having a minimal 
number of edges (we give examples in figure 7). Then, as card(L) < 9 and 
card(Vo) < card{L ), we have the following cases to consider: 



— card(Vo) < 4. 

— cardfVa) = 5. 

— card(Vo) = 6. 

— card(Vo) = 7. 



G V (L, Vq) = G^{L, Vo) (K 5 and K 3 , 3 have respectively 5 and 
6 vertices). 

If G(L,Vq, E 0 ) = K 5 , as E 0 is minimal, each edge of I\ 5 
would be associated with a label, thus card(L) > 10, which 
contradicts the hypothesis. 

We have card{Lu{Vo)) > 6 and card(L) < 8. Then L con- 
tains one or two labels which do not belong to Lu(V o). 

If L\Lu(V o) = {/}, then E 0 consists in a path joining the 
vertices having l in their set of labels and G(L,Vo, Eo) is 
planar. 

If L\Lu(V o) = {1,1'}- to build a L_connected graph on six 
vertices connecting two labels, we need less than 10 edges. 
Such a graph cannot contain any K 3 3 or K$. 

As L\Lu(Vq) = {/}, E 0 consists in a path joining the vertices 
having l in their set of labels. 

□ 



4.3 Construction of a Partition of V\Vo 

Let Go = (L, Vo,E 0 ) be a planar L_connected graph. To extend Go with the 
vertices of P\Vo, we first build a partition P\Vo into a family of sets Vi,...,Vk 
with k < card(L): 
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Fig. 8 . Insertion of vertices of V4. V = {ab, cd,ef, gh, aceg, adeg}, Vo = 
{ ab,cd,ef, gh }, V4 = {aceg, adeg}, Eo is empty. In a first time, aceg is inserted by 
adding four edges ( aceg,ab ), ( aceg,cd ), ( aceg,ef ), ( aceg,gh ). Then adeg is inserted by 
adding two edges: (aceg, adeg), ( adeg,cd ) 



A vertex v of P\Vb belongs to Vi if and only if the addition of i edges connect- 
ing i> to Vo is necessary and sufficient to L.connect v to Vo- We note Wmxat(v, Vo) 
the set of subsets W of Vo such that card(W) = i and v is L.connectable to W. 
One shall notice that hypothesis (HI) on V implies that V\ is empty. 

Before extending Go with the V), we will give general results on the V^^o- 

Lemma 1. If v £ V n and if W n = {wi,...w n } is an element of Wmxn( v : Vo), 
then card(m(v)) > n and m(v) = {h..l n } U L r , with L r C L(W n ) and U £ 
Lu Wn (wi). 

Proof. As v is in V n , v is L_connectable to W n . Therefore, if there was w t in W n 
such that m(v) fl Luw n {wi) = 0, then v would be L_connectable to W n \{tCi} 
and W n Wmtat(v, Vo). □ 

Proposition 5. If V n , n > 2 is not empty thenVv £ V n ,VW n £ W.mxaK'c, Vo), n 
^ card(L(W n )) 

— 2 

Proof. Sketch of proof (the detailed proof can be found in [12]): Let us suppose 
that v £ V n and W n £ Vo). Then using lemma 1, we show that if 

L(W n ) < 2 n, one can find a set of vertices W' C W U {«} such that L(W) = 
L(W') and either card(W' ) < card(W) or card(W') = card(W) and W <l W' . 

□ 

Thus, using this result, we know that V n is empty when 2 n > card(L). In 
particular, when card(L) < 9, V = Vq U V 2 U V 3 U V 4 . 



4.4 Construction of a L_Connected Planar Graph when card(L) < 9 

For the readability of the paper, we will give here an idea of the construction. 
The detailed description of the construction is presented in [12]. 

By definition of V), we can add i edges connecting a vertex v of V) to Vo and 
obtain a L_connected labelled graph G' . However, our goal here is to keep also 
the planarity of the graph while extending it. Thus, to built a planar labelled 
graph on V, we insert incrementally vertices of P\Vo on the L_connected planar 
graph Gq. We first insert vertices of V n with n maximal. 
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A abgh 





Fig. 9. Insertion of vertices of V 2 and V3 in Go{L, Vo,Eo). 

A: Vo = { abgh,efg,cdh }, E 0 = {(abgh,efg), (abgh,cdh)}, V 2 = {af,dgj, V 3 = 
{ ace,ade,bdf }. The vertices of V3 are inserted incrementally: the two labeLdisjoint 
vertices ace and bdf are inserted in the two faces defined by abgh, efg and cdh. Then, 
ade is inserted by adding only two edges connecting ade with ace and cd. 

B: Vo = {ac,def,deg,bd}, Eq = {(def, deg), (deg,bd)}, V 2 = { ae }, V3 = 

{afg, abe, abg}. 

C: Eo = {ab, cd, ef, fg, fh}, E 0 = {(/ft, fg), ( fg , ef)}, V 2 = {bf, ag}, E 3 = {acf} 

— V 4 is non empty only when card(L) = 8 and card(V 0) = 4. In this case, V 4 
contains at most 2 elements (cf figure 8): four edges are added for the first 
element of V 4 , and two edges are necessary to connect the second element 
of V 4 . 

— When it is not possible to add i edges to Go to insert a vertex v of Vi 
without breaking the planarity then v is connected with another vertex of Vi 
already inserted in Go, as it is the case in figure 9 A. This is always possible 
when card(L) < 9: otherwise this leads to a contradiction on the hypothesis 
on Vo- In fact, by using the partition of V in Vo,...,V n in this process, we 
have restricted the number of cases to consider to a few generic cases when 
card(L) < 9. 

We then obtain the following result: 

Theorem 2. When card(L) < 9 then for any set V of labelled vertices on L, 
Q)p(L,V) is not empty. 

Then, using proposition 1, we have: 

Corollary 2. For any set of non empty distinct sets X = {Ad, Ad} such that 
k < 9 there is an extended Euler diagram representing X . 

Remark 1. Let us consider the set of vertices V of figure 5. Vq = { abd , cd}, V\ = 
{ab, ad, bd} and V 2 = {be, ac}. G(L,Vq,E) has only one edge (■ abd,cd ) and the 
L_connected labelled graph built by inserting successively the vertices of V 2 
and V\ leads to an extended Euler diagram where each label corresponds to 
a connected region (cf. figure 10) . 

4.5 Hypergraph Vertex-Planarity: 

An Equivalent Formulation of the Problem 

Extended Euler diagrams can be related with Johnson and Poliak’s notion of 
planarity for hypergraphs [7]. 
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Fig. 10. The L_connected labelled graph built by computing the Vi and the corre- 
sponding extended Euler diagram 

Let H = (V,E) be an hypergraph and X = {Xi,...,Xfc} be a set of non 
empty distinct subsets of X such that there are: 

— a one-one map e from the set of hyperedges E and X = {Xi, ..., X^}, 

— a map a between V and the set of all possible non empty intersections 
between the X*, Y r = {Fi, Y 2 , ..., Y m } satisfying: Vi> € V, v belongs to the 
hyperedge e of E if and only if a(v) C e(e). 

If an extended Euler diagram (L, C, Z) is a diagram representation of X, then 
(L, C, Z ) is a vertex-based diagram representing the hypergraph H = ( V , E) 
and H is vertex-planar according to Johnson and Poliak’s definition. 

Interpreting theorem 2, we obtain the following result on hypergraphs: 

Corollary 3. Any hypergraph having at most eight hyperedges is vertex-planar. 

5 Conclusion 

We have shown that there exists a planar L_connected graph for any collection 
of intersections between up to eight sets {Xi,...Xfc}. This planar L_connected 
graph can be used to build an extended Euler diagram representing {Xi, ...X*,}. 

Interpreting our work using Johnson and Poliak’s notion of planarity [7] we 
have shown in this paper that any hypergraph having at most eight hyperedges 
is vertex-planar. 

We are currently working on the algorithm to produce the planar graph 
and the extended Euler diagram. However, to reach the purposes described in 
the introduction, i.e. to create a semantically structured map of the results of 
a complex query, we have to address a few more tasks. Indeed, for most of 
the collections of intersections, there exists many planar graphs satisfying the 
constraint of L_connectivity, and the graph built from the proof may not be 
the most adapted to our purposes. Then at this graph level, we may have to 
introduce some graphical criterion to provide the user the most readable diagram. 
Moreover, we still have to find the best embedding according to visibility and 
usability criterion. 
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Abstract. This paper extends Shin'sfl] and Hammer's[2]work by bringing in 
the names of individuals within Venn diagram and proving thereby the 
soundness and completeness results. History: Beginning with Euler (1772) 
diagrams have evolved in the hands of Venn(1880) and Peirce(1896). Each had 
the aim of making the relationship between sets and binary properties of 
emptiness and non-emptiness more and more clear. In recent years Shin(1994), 
Hammer(1995) and House et al(2001) have regenerated interest in diagrams by 
fonnalizing the logic of diagrams. 



1 Introduction 

We have extended the formalism of Shin [1] and Hammer [2] by incorporating the 
names of the individuals and absence of individuals in the set and developed the 
monadic predicate logic of diagrams. This work has some overlap with [6] but 
addresses widely divergent issues. 

2 The Language of Diagram 

Primitive symbols 

| | Rectangle: the universe. O Closed curve: monadic predicates. 

£2^1] Shading: indicating emptiness, x : cross indicating non-emptiness 
ai,a 2 , a 3 . . . a n : names of individuals, a: absence of individual named a 
Aj, A 2 , A 3 . . . A m . names for closed curves: 

— Line connecting crosses(x's) (lee), in the degenerate case lee reduces to single x. 

— :broken line connecting individuals (afs) (lei), in the degenerate case lei contains 
just one individual symbol, say a. diagrammatic object : the items x .1 , a; _ a 

The x's (a-, 's) in a sequence of x's (afs) in an lee (lei) connected by — (— ) are called 
nodes. 

Def: Any closed curve without any diagrammatic object is called a blank closed 
curve. 

Def: Well formed diagram (wfd). 
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Type I : A single blank closed curve A within a rectangle is a wfd. A single closed 
curve with one or more diagrammatic object inscribed within it is a wfd. Following 
are some examples: 




Fig. 1. 



Type II: A rectangle containing more than one closed curve such that each closed 
curve cuts each other closed curve in exactly two points. The minimal region so 
formed may have or may not have entries of diagrammatic objects. The closed curve 
must not pass through x, a or a. The diagram may also contain lcc or lei with the 
restriction that none of the nodes appear more than once in the same minimal region. 
If there is shading it has to cover an entire region. Below are some examples of type- 
II diagram: 






Type III: If Dj, D 2 ,...,D n are type I or type II and D' results into connecting 
Dj, D 2 , . . ., D n ( or written as D; — D 2 — . . . — D n ) by straight lines then D' is a wfd. 
Each D/s of D' are called components of the diagram. 





3 Transformation Rules 



Introduction Rules 

for closed curves: If D is a diagram then a new closed curve can be drawn obeying 
the restriction given in the formation rules of type II diagram and the following: 

I[ : The newly introduced curve should not include any x, a or a already existing. 



144 L. Choudhury and M.K. Chakraborty 



I 2 If x or a occur in A and B is the newly introduced curve then x or a occur in 
the region 

AH B and a line (or broken line) connecting the former x or a and the new one. 















A V, 


0‘ 








® A 




f rra| j 













Fig. 4. 



for a: 



I3.1 :If a is present in A then a should be introduced in the region AH B induced by the 

new curve B. 




I3.2 : If a diagram D contains a-sequence in some region then in the 
remaining regions a can be introduced. 

I 3 . 3 : If a region is shaded in a diagram D then in the same diagram we can eliminate 
shading and introduce a. 
for lcc or lei 

I 4 : If in a diagram D there is a x or a in some minimal region, then a x or a may be 
introduced in any other non-shaded minimal region with line connecting them. 
I 5 : for type III diagrams 

If D is a wfd then for any wfd Di we have D — D t 



Elimination Rules 

for lcc or lei. 

E1.1: An entire sequence of nodes of x's or a's from any diagram may be dropped. 
Ei. 2 : If in a diagram containing sequence of x's or a's some nodes fall in shaded 
region we drop those nodes and continue the chain. If some of the a-nodes of an a 
sequence fall in region containing a, we drop those nodes and continue the chain 
(Fig6). 




Fig. 6. 



Fig. 7. 
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E 2 : From any diagram, the shade from any minimal region can be eliminated. 

E 3 : A closed curve can be eliminated from a type I or type II or type III diagram 
along with all the diagrammatic objects present in it. The x or a outside the closed 
curve joined by a line or broken line with x or a within the closed curve shall be 
retained (Fig7). 

Elimination of objects from type I /type II /type III diagrams 

E 4 : Let r be any minimal region of a type I /type II /type III diagram and let there be 
more than one a*, x or a in r, all these objects in the same region, then different 
diagrams of same type and closed curves with same name can be obtained each 
containing exactly one of the above mentioned symbols in each appropriate minimal 
region of the diagram (Fig8). 




Fig. 8. Fig. 9. 



Inconsistency Rules 

Def. (i) If in a chain of x's all the nodes belong to shaded regions, the diagram is 
inconsistent, (ii) If in a chain of a-nodes all the occurrences of a co-exist with 
occurrences of a or shading in the same region the diagram is inconsistent.(iii) A type 
III diagram is inconsistent if all its components are inconsistent. 

I n Ci: An inconsistent component from a type III diagram may be dropped (Fig9.). 
I n C 2 : Any diagram follows from an inconsistent diagram. 



Unification Rule 



for type I or type II diagram 

Up Diagrams Di and D 2 can be united into one diagram by following the introduction 
rules and identifying the closed curves having the same name. We express this as D = 
Uni (D!,D 2 ). 



® A fj w 



Di 



D 2 Uni(Di D 2 ) 




Fig. 10. 



Fig. 11. 



U 2 : for a type III diagram with type I /type II /type III diagram. 

a) I f D — D 2 is a type III diagram and D 3 be a type 1/ type II then the unification 

rule will unite these two diagrams into Uni (D L D 3 ) — Uni (D 2 D 3 ), a type III 
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diagram, b) If Dj — D 2 and D 3 — D 4 be two type III diagrams then we get Uni 
(Dj D 3 ) — Uni (D 2 D 3 ) — Uni (D t D 4 ) — Uni (D 2 D 4 ) by this rule. 

From these two diagrams by unification rule we get 



'(lira® 



mW- 



n- Jh 



Eb- d„ 



Fig. 12. 



4 Semantics 

We can extend the semantics given by Hammer[2] as follows: 

We say a diagram D is true in model M written as (M |= D) iff 1) If r is shaded then 
I(r) = c)).2) If r has x then I(r) ) ^.3) If r has a within it then 1(a) e I( r). 4) If a is in r 
then not (I (a) e I (r)) 5) if a— a...— a is an lei, a's occurring in regions ri,r 2 ..r n then 
1(a) e I(rj) or I(r 2 )...or I(r n ) (or being exclusive) 6) If x — . .. — x is an lee x's 
occurring in regions r 1 ,r 2 ..r n then I(x) e I(r 3 ) or I(r 2 )...or I(r n )(or being inclusive) and 
the definition of the notion of logical consequence as in Hammer[2]then the following 
theorem can be established. 

5 Conclusion 

For any finite set A of diagrams, A U{D}, 

1. If A |- D then A \= D (soundness theorem). 

2. If A |= D then A |- D (completeness theorem). 
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Abstract. Projected contours enable Euler diagrams to scale better. 
They enable the representation of information using less syntax and can 
therefore increase visual clarity. Here informal reasoning rules are given 
that allow the transformation of spider diagrams with respect to pro- 
jected contours. 



1 Spider Diagrams and Projected Contours 

A spider diagram [3] is an Euler diagram with plane trees (spiders) that represent 
the existence of elements and shading in regions that indicate upper bounds on 
the cardinalities of sets they denote. Projected contours [1, 2] here are dashed 
and non-projected contours are called given contours. The semantics of projected 
contours are given in [1]: a projected contour represents the intersection of the 
set denoted by its label with the set denoted by its context (the smallest region, 
defined in terms of given contours, that it intersects). 

In Fig. 1, let our universe of discourse be the people attending Diagrams 
2004. Let M be the set of mathematicians, Cog be the set of cognitive scientists, 
Com be the set of people able to turn a computer on, and D be the set of people 
able to draw a decent diagram. Both diagrams assert there is nobody who is 
both a mathematician and a cognitive scientist, there is at least one cognitive 
scientist who can turn on a computer and only one mathematician who can draw 
a decent diagram. Note that d2 does not assert that no mathematicians are able 
to turn on a computer, nor does it assert cognitive scientists are unable to draw 
decent diagrams. The projected contour labelled D in d2 represents M flD only. 
Likewise the projected contour labelled Com in d2 only denotes Cog C I Com. 
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Fig. 1 . Two semantically equivalent spider diagrams 
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Fig. 2. Initial spider diagram 




Fig. 3. Reasoning with projected contours 



2 Reasoning Rules 

Here we introduce informally four of the reasoning rules that allow us to reason 
with spider diagrams that contain projected contours. We illustrate their appli- 
cation by using them to transform the spider diagram in Fig. 2 into the diagram 
of Fig. 4. The transformation is illustrated in Fig. 3. 

Note the spider diagram in Fig. 2 asserts A n B is non-empty and D is a subset 
ofAUBUC (equivalently, in the complement of A U B U C, D is empty). 

1. Rule 1: Replacing a given contour with a projected contour. Ap- 
plying this rule may result in some of the existing projected contours being 
erased. (We apply Rule 1 to the given contour labelled D in dl giving d2). 

2. Rule 2: Splitting a projected contour allows us to replace a projected 
contour with two projected contours that partition the context of the orig- 
inal. This rule is reversible. (We apply this rule to the projected contour 
labelled D in d2 giving d3). 
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Fig. 4. Resultant, semantically equivalent, spider diagram 

3. Rule 3: Erasing a projected contour allows us to erase any projected 
contour providing we also erase partial shading and leave at most one foot 
of each spider in a zone. (Applying this rule to d3 gives g?4). 

4. In transforming d4 into d5 we use rule 1 to replace the given contour la- 
belled C with a projected contour. 

5. Rule 4: Erasing a shaded zone allows us to erase any shaded zone pro- 
vided no spiders touch it and the resultant diagram still represents it by 
either exclusion or containment of contours. This rule is reversible. (We ap- 
ply this rule to the shaded zone of d5 to give d6). 

6. To transform d6 into d7 we use rule 2 again, this time splitting the projected 
contour labelled C. Finally, we transform dl into d8 by erasing the projected 
contour in A U B using rule 3. 

We have used four reasoning rules to transform dl of Fig. 2 into the seman- 
tically equivalent, and much clearer, d8 of Fig. 4. The reasoning rules 1-3 here 
could similarly be used in Fig.l to transform d into d' . 



3 Further Work 

Work on a system of spider diagrams to include projected contours is progressing. 
Syntax, semantics and reasoning rules have been developed and formally defined 
and work is currently underway to show this extended system to be both sound 
and complete. More information on this and related works can be found at 

www. cmis . bright on. ac .uk/research/vmg. 
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Abstract. In problem solving a goal/subgoal is either solved by generating 
needed information from current information, or further decomposed into 
additional subgoals. In traditional problem solving, goals, knowledge, and 
problem states are all modeled as expressions composed of symbolic predicates, 
and infonnation generation is modeled as rule application based on matching of 
symbols. In problem solving with diagrams on the other hand, an additional 
means of generating information is available, viz., by visual perception on 
diagrams. A subgoal is solved opportunistically by whichever way of 
generating information is successful. Diagrams are especially effective because 
certain types of information that is entailed by given information is explicitly 
available - as emergent objects and emergent relations - for pickup by visual 
perception. We add to the traditional problem solving architecture a component 
for representing the diagram as a configuration of diagrammatic objects of three 
basic types, point, curve and region ; a set of perceptual routines that recognize 
emergent objects and evaluate a set of generic spatial relations between objects; 
and a set of action routines that create or modify the diagram. We discuss how 
domain-specific capabilities can be added on top of the generic capabilities of 
the diagram system. The working of the architecture is illustrated by means of 
an application scenario. 



1 Introduction 

Diagrams are ubiquitous as aids to human reasoning. They are potentially useful 
representations either when the domain is intrinsically spatial, as in the case of maps 
for route planning, or when there is a mapping from domain information to two- 
dimensional space that preserves certain properties, as in the case of pie charts, where 
the ratios of wedge angles are the same as ratios of certain quantities in the target 
domain. Some of the ways diagrams help are not unique to diagrams as spatial 
representation. For example, like many external representations, diagrams are a 
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memory aid. They also enable certain abstractions to stand out, by omitting details 
that are not relevant. Maps work this way - in contrast to an aerial photo of the same 
region of space, only certain objects are shown, and many of them are iconized rather 
than spatially veridical, such as a church icon to represent the location of a church, 
instead of the actual spatial extent of the church. But diagrams enable deeper use of 
their spatiality in comparison to textual forms external representation. Diagrams help 
by enabling perception to pick up information corresponding to what one might call 
emergent objects and emergent relations. An emergent relation is exemplified in 
Fig.l. Similarly, if one were to draw a diagram consisting of two intersecting curves, 
visual perception can pick up a number of emergent objects, such as their intersection 
points and the curve segments. If the curves represent roads, a problem solver can 
make use of these new objects in, say, route planning. To the extent that recognition 
of such objects and relations is easy for the perceptual system of a particular agent, 
the agent can avoid complex sequences of reasoning or calculation that might 
otherwise be required in deliberative problem solving to obtain the same information. 

We distinguish the work in this paper from two other research streams in 
diagrammatic reasoning. In one stream, the focus is on algorithms that operate on 
representations of diagrams to produce new diagrams that represent some information 
of interest. The authors of such algorithms usually contrast them with symbolic 
reasoning, and propose that their algorithms exploit the structure of space in their data 
structures, e.g., local operations on pixel arrays. In another stream, the focus is on the 
logical aspects of diagrams, in particular on how proofs may be thought of as 
transformations of diagrams e.g., Venn diagrams as they are often used in informal set 
theory. The goal is to give diagrams and their manipulations the same logical status as 
sentences and their transformations in traditional logic. Our aim, however, is to 
understand reasoning and problem solving activities that combine representation and 
inferences in both diagrammatic and symbolic modes so that different subproblems 
are solved by the mode that is best suited for them. Research in this area may make 
use of algorithms and ideas in the other streams. For example, some of the inferences 
from diagrams - we call them Perceptual Routines and discuss them later in the paper 
— may make use of algorithms in the first stream, and some decisions to transform 
diagrams in certain ways may relate to symbolic reasoning. Flowever, the focus in 
our research is an architecture that best deploys each of the modes. 

Our goal in this paper is to propose and develop a representation for diagrams that 
is as domain-independent as possible. The goal is to provide a diagrammatic 
representation that can help a computational problem solver, much as external 
diagrams help human problem solvers. We intend to present a certain intuition about 
a diagram as an abstract, but still spatial, notion. We also extend existing goal- 
directed problem-solving frameworks in such a way that diagrammatic 
representations as we conceive them can be used as a basic part of problem state 
representations, complementing the traditional predicate representations. Though we 
hope that much of what we say is consistent with important aspects of the human 
cognitive architecture, our motivation is not to propose a psychological theory, but 
rather to expand the options available to designers of AI and human-machine systems. 
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Fig. 1. The figure is generated from description, “A is to the left of B, and B is to the left of C,” 
and relation “A is to the left of C” is emergent, and can be picked up human perception 



2 Problem Solving: Information Generation 
for Goals and Subgoals 

Problem Solving as Goals and Subgoals. On a standard account in AI of problem 
solving (see [1] for the most evolved version), problem solving is a means-ends, or a 
subgoaling, process. The agent uses general and domain knowledge to decompose the 
goal into subgoals, subgoals into further subgoals, and so on, as needed. The task of 
problem solving then is to solve enough subgoals so that the main problem can be 
solved. Each important step in problem solving then is to solve a (sub)goal, or to 
decompose it into additional subgoals. For our current purpose, it is useful to think of 
goals in information terms: achieving a goal (whether the main or a subgoal) 
corresponds to having information of a certain description. Solving a (sub)goal 
corresponds to inferring the information needed from background general and domain 
knowledge, infonnation defining the problem, and information generated in earlier 
steps in problem solving. Thus, as each subgoal is solved, it may provide information 
that may help infer solutions to other subgoals, and finally the main goal. As 
information is generated, it results in change in the representation of problem state. 

Let us consider an information generation step in the above problem solving 
process, say I 2 fromi) . For /, to be support the generation of information I 2 , f must 
entail I 2 in some sense. Information generation usually involves representing 
information in some representation framework. In general, if /j is represented by a 
representation R 7 in a representation system A. not all the information entailed by 
/, will be explicit in R^ . The task then becomes one of transforming R^ into a 
representation R 2 such that R r makes I 2 explicit. However, as has been often 

remarked in the literature on diagrammatic reasoning (e.g., [2]), representation 
systems differ in the degree to which the representation of information / also 
explicitly represents other information entailed by I . In diagrammatic reasoning, two 
different representation systems are employed: 

• Information generation by rule-based transformation of symbol strings. In most 
standard accounts, the goals, subgoals, and information are all represented in terms 
of symbolic predicates, such as the familiar On (A, B) in the blocks domain. In this 
representation, only I is explicit in R 2 . Entailed information is generated by 
applying rules to such expressions composed of predicates. For our purposes, AI 
representations such as frames, rules, etc., still belong in this category. The 
relevant point is that the representation is composed of predicates, and explicit 
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rules determine what information can be generated from what predicate 
expressions. 

• Information generation by perception from spatial representations . Diagrams 
(more generally visual representations) are a class of representation whose 
instances automatically make some of the entailed information available, provided 
suitable pickup capabilities are present. The medium constrains the representation 
in such a way that it is not possible to represent I without representing a range of 
entailed information. In human use of diagrams, visual perception can pick up 
some of the entailed information, as in the example in Fig. 1 

The architecture that we describe has both of these representational and inference 
generation modalities, working together. The problem solving and the representation 
are bi-modal. 

Flexible Integration of Information Generation Modes. As mentioned earlier, as 
information is generated, the problem state representation changes. When diagrams 
are used, the change may be to either or both diagrammatic and predicate 
components. Change to the diagram may involve adding or deleting diagrammatic 
elements, or changing spatial specifications of the existing elements. As indicated, 
such changes give rise to emergent objects and relations in the diagram, which 
information may be picked up by perception in problem solving steps further down 
the line. In general, a source of power in problem solving is the way information 
generated at one step creates potential for further information generation in future 
steps. In diagrammatic reasoning, the combination of the two types of information 
generation operations can be especially powerful: information might be obtained in 
one step from the diagram by perception; the information so obtained might be 
combined in the next with other information in memory to support applying an 
inference rule to information in symbolic form, which might result in changes to the 
diagram, which might in turn give rise to emergent objects and relations that can be 
picked up by perception, and so on. Diagrams add a whole new level of power to this 
opportunistic goal-directed behavior that is problem solving. 

Diagrams Can Be Internal or in a Computer Memory. Notwithstanding the debates 
about the nature of mental images, there is a functional level at which for many 
problems a mental image plays a role similar to that of diagrams on external surfaces. 
That is, problem solving proceeds as if information is being obtained from a mental 
image, and the process of problem solving is similar to what happens when the 
diagram is external. The essential steps in the reasoning of a human problem solver 
are the same whether or not she draws an external diagram such as in Fig. 1., or 
imagines some version of it. In either case, a step in the problem solving sequence is 
to assert that the problem solver sees that A is to the left of C in the instance the 
solver has created or imagined, and to generalize the conclusion from the instance to 
the general case. This sequence of steps is in marked contrast to the following 
sequence: Represent the given information in Figure 1 as Left-of(A,B) A 
Left-of (B, C) , represent knowledge about transitivity as VxVy (Left- 
of(x,y) A Lef t-of (y, z) } -> Left-of (x, z) }, and apply the predicate 
system's inference capability to generate the information, Left-of (A,C). In the 
former, spatial relational information is “seen” and asserted for an instance, and 
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implicitly or explicitly generalized. In the latter, the reasoning starts with a general 
rule - in this case about the transitivity of Left-of - , instantiates it, and infers the 
required information by applying the rule. Whether the diagram is external or the 
agent claims to solve it by means a mental image, the structure of reasoning is 
essentially similar, but different from that corresponding to the symbolic predicate- 
rule-based inference process. In this sense, whatever the true underpinning of a 
mental image, it is a functional equivalent of an external diagram with respect to set 
of information extraction operations. 



3 Diagrammatic Representation 

A representation system is characterized functionally by (i) a repertoire of 
representational primitives, (ii) ways of creating/modifying a representation to 
represent given information by composing primitives, and allowed ways of 
transforming them to make desired information explicit, and (iii) ways of reading 
information that is explicit in a representational instance. How these three functional 
characterizations apply to predicate-representations is well-known. For our diagram 
representation system, dubbed DRS, we describe below the basic set of primitive 
objects, creation/modification operations that we call Action Routines (ARs), and 
information reading capabilities that we call Percepual Routines (PRs). The diagram 
so constructed, along with a set of ARs and PRs, is intended to be functionally 
equivalent to an agent obtaining information from a diagram - external or internal - 
during problem solving. Defining DRS functionally avoids many of the contentious 
issues surrounding whether a computational representation is “really” a diagram, just 
as defining a mental image functionally at the end of the previous section was a way 
to avoid getting entangled with the question of whether mental images are “really” 
images. 

Three Dimensions of Perceptual Experience. We view the experience corresponding 
to visually perceiving a scene as having three dimensions. The first and most basic 
dimension is that of seeing the world as composed of objects - the figure, a set of 
objects, is discriminated from ground. The objects are perceived as shapes [3], an 
experience like that evoked by a Henry Moore abstract sculpture. With diagrams, this 
dimension corresponds to perceiving it as composed of objects - or figures - as 2-d 
shapes. The second dimension is that of seeing as, e.g., recognizing an object as a 
telephone, or a figure in a diagram as a triangle. The recognition vocabulary comes in 
a spectrum of domain-specificity. The third dimension is relational - seeing that an 
object is to the left of another object, taller than another, is part of another, etc. The 
result of these latter two dimensions of perception is not the spatiality of the objects as 
such, but symbols and symbol structures that assert and describe, albeit in a 
vocabulary using spatial terms. 

3.1 Diagrammatic Objects 

Our basic representation of a diagram is intended to capture the first dimension in the 
preceding discussion - in this case 2-d shapes of diagrammatic objects. Perceptual 
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routines then act on this to recognize object types and relations. Conversely, action 
routines may construct or modify aspects of the diagram, as we will discuss shortly. 

A Diagram is a pair (9, DDS) where 9 is the image, defined as a specification - 
implicit or explicit - of intensity values for points in the relevant regions of 2-D 
space, and DDS is the Diagram Data Structure, which is a list of labels for the 
diagrammatic objects in the image; associated with each object label is a specification 
of the subset of 9 that corresponds to the object. A diagrammatic object can be one of 
three types; point, curve, and region. Point objects only have location (i.e., no spatial 
extent), line objects only have axial specification (i.e., do not have a thickness), and 
region objects have location and spatial extent. The labels are internal to DDS. 
External labels such as A, B, etc., in Fig. 1 are additional features of the objects that 
may be associated with them. 




Fig. 2. A DRS for a simple diagram composed of a curve and a region. However, note that 
there are other objects: distinguished points such as end points, the closed curve defining the 
perimeter of the region, etc 

9 is any description from which a specification of intensity values for the relevant 
points in the 2-D space can be obtained. It can be extensional, such as a specification 
of a 2-D array and the intensity values of the elements of the array. It can be 
intensional, such as algebraic expressions or equations that describe point, line or 
region objects. Fig. 2 is an example DRS for a simple diagram. 

There are several important distinctions between an external diagram as an array of 
marks on paper - a raw image - and DDS. The first distinction is that DDS is the 
result of a Fig. -ground discrimination already made, the array interpreted as objects, 
along with their spatial specifications. The second distinction is that DDS involves a 
particular type of abstraction. In external diagrams, points and curves are really 
regions, with some convention that signals to the user which of the regions are to be 
interpreted as abstract points and curves, and which to be taken as regions. DDS 
incorporates the abstraction. Thus, in Fig. 1., point A is shown as a circular region. 
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The DDS corresponding to point A, however, has as its spatial specification just the 
intended coordinate of the point. Similarly, a curve in a physical diagram would have 
a non-zero thickness, whereas the corresponding representation in DRS will be as an 
abstract curve, i.e., as a type of object with no thickness. Next, those marks or visual 
elements in physical diagrams that have purely iconic roles are not represented 
spatially in DDS, even though in the physical diagram they have spatial extent. 
Examples are alphabetical symbols that are often placed next to diagrammatic 
elements to name them, icons such as that of a church on a map at a location to 
indicate the type of building, coloring that has an iconic role, e.g., red and blue 
objects standing for enemy and friendly units in a battlefield map, and the cross- 
hatching of a curve on a map that might indicate that it is a railroad. Object 
representations in DDS have fields that can hold these abstract symbols; e.g., the 
railroad will be represented by a curve in DDS, with an appropriate symbol in a field 
of the object representation. The problem solver would interpret it as a railroad. 
Next, DDS is generic. Information that is domain-specific, e.g., that a certain curve 
is a defensive perimeter, is simply treated as abstract symbolic information, and 
incorporated as labels attached to the objects to be interpreted by the problem solver 
using domain-specific conventions. At the level of DDS, there is no recognition of a 
curve as a straight line, or a region as a polygon. These are recognitions that will be 
made further downstream by perceptual routines. Also, how curves and regions are 
actually represented in a computational realization is not relevant at the level of DDS. 
For example, in a specific implementation, a curve may in fact be represented as a 
sequence of line segments (as we do in our implementation), or as a set of pixels in an 
array-based representation. These are representational details below the level of the 
abstraction of DDS, in which such an object is still a curve. 

Finally, in addition to specification of spatial extent, external diagrams may have 
visual aspects that are of representational significance beyond iconic. For example, 
grayscale intensity gradients across a region object in a diagram may represent the 
values of a real variable in a domain, such as a weather map in which the grayscale 
intensity at a point represents the temperature at that location. Perceptions such as 
“the intensity is least in this subregion,” and “the intensity increases from the center to 
the periphery of the region” convey information in an assertional form that may be 
useful for making further inferences. The DDS framework can be extended to capture 
the gradation of intensity values over space for the various objects. 

Diagrams are constructed by placing diagrammatic objects on a 2-D surface in 
specific configurations to represent specific information. DDS will initially consist of 
the labels of objects so placed and their spatial specifications. As objects in DDS are 
created, deleted or modified, new objects might emerge, objects may change their 
spatial extents, and existing objects might be lost. Perceptual routines will make these 
determinations and DDS will be updated to reflect these changes. DDS mediates the 
interaction of the problem solver with .1 For example, given a question such as “Is A 
to the left of C?”, the information in DDS is used to identify the image descriptions 
for A and C as arguments for the perceptual routine Left -of (X, Y) . In principle, 
the same image 3 might correspond to different DDS's depending on which subsets 
are organized and recognized as objects, such as in the well-known example of a Fig. 
that might be perceived as a wine glass or profiles of two faces. 
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In keeping with the functional notion, DDS can represent only a diagrammatic 
instance (not a class), and its spatial specifications must be complete, just as in the 
case of an external diagram, and unlike in the case of predicate-based representations 
that can provide partial specification of a situation, or of classes of situations (such as 
for all). This does not mean that the agent is committed to all the spatial details in the 
DDS - it is the task of the problem solver to keep track of what aspects are intended 
to represent what. 

Representation in Memory. Just as problem state is bimodal, we think representation 
of knowledge in memory is in bimodal as well, for agents that use diagrammatic 
representations. We have route fragments in mind that we can bring to our 
consciousness as diagrams, and compose these fragments into larger routes. DRS 
fragments can be stored in memory, just as knowledge in symbolic predicate form. 

3.2 Perceptual Routines 

Ullman [4] proposed a set of elementary visual routines, such as visual search, texture 
segregation and contour grouping, that might be composed to perform complex visual 
tasks. Our notion of perceptual routines 1 (PRs) is based on a similar notion of 
composable and extensible primitives, but more oriented to the needs of problem 
solving with diagrams. Because of our interest in generic objects, aspects of our 
proposals are intended to be domain-independent. 

A PR takes specified objects in a diagram as arguments and produces information 
corresponding to a specific perception. PRs can be categorized into two classes, the 
first set producing objects with spatial extent as their output, and the second set 
producing symbolic descriptions. The first includes PRs that identify emergent 
objects — points, curves, and regions — that are created when a configuration of 
diagrammatic objects is specified or modified, and similarly objects that are lost. 
The PRs of the first class are domain-independent in that the point, curve, region 
ontology is completely general. The PRs of the second class produce symbolic 
descriptions belonging to one of three kinds: (i) specified properties of specified 
objects (e.g., curve C has length of m units), (ii) relations between objects (e.g., point 
A is in region R, curve Cl is a segment of curve C2, object A is to the left of object B, 
values of the angles made by intersection of curves Cl and C2), and (iii) symbols that 
name an object or a configuration of objects as an instance of a class, such as a 
triangle or a telephone. 

The PRs of the second class come in different degrees of domain specificity. 
Properties such as length of curve, area of a region, and quantitative and qualitative 
(right, acute, obtuse, etc.) values of angles made by intersections of curves are very 
general, as are subsumption relations between objects, such as that curve Cl is a 
segment of curve C2. Relations such as Inside(A,B), Touches(A,B), and Left-of(A,B) 
are also quite general. PRs that recognize that a curve is a straight line, a closed curve 
is a triangle, etc., are useful for reasoning in Euclidean geometry, along with relations 
such as Parallel (Line 1, Line2). The PRs of the second class are open-ended in the 
sense that increasingly domain-specific perceptions may be conceived: e.g., an L- 



i 



We call them perceptual routines because one of our long-term goals is to extend the notion 
of the cognitive state to multiple perceptual modalities, and vision is just one modality. 
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shaped region, Half-way-between(point A, point B, point C). Our goal for the current 
set of PRs is what appears to be a useful general set, with the option for additional 
special purpose routines later on. The following is a list of perceptual routines of 
different types that we have currently identified and implemented as being generally 
useful. 

Emergent Object Recognition Routines: Intersection-points between curve objects, 
region when a curve closes on itself, new regions when regions intersect, new regions 
when a curve intersects with a region, extracting distinguished points on a curve (such 
as end points) or in a region, extracting distinguished segments of a curve (such as 
those created when two curves intersect), extracting periphery of a region as a closed 
curve. Reverse operations are included - such as when a curve is removed, certain 
region objects will no longer exist and need to be removed. 

Object Property Extraction Routines: Length-of( curve) (this will return a number), 
Straightline(Curve) and Closed( Curve) will return True or False. Additional property 
extraction routines can be added as needed. 

Relational Perception Routines: Inside (11,12), Outside, Left-of, Right-of, Top-of, 
Below, Segment-of (Curvel, Curve2), Subregion-of (Regionl, Region2), On 
(Point,Curve), and Touches(objectl, object2), angle-value(curve AB, curve BC). 
Subsumption relations are especially important and useful to keep track of as object 
emerge or vanish. 

Abstractions of Groups of Objects into Higher Level Objects: Objects may be 
clustered hierarchically into groups, such that different object abstractions emerge at 
different levels. 

When to invoke PRs. As a diagram is created or modified, and given a repertoire of 
PRs, when a specific PR should be applied is a practical computational issue. It 
would be impractical to apply all of them and generate all the perceptions. In our 
current implementation, the only PRs that are immediately applied as a diagram is 
created or changed is the set of emergent object routines, with all other PRs applied 
only in response to a problem solving need. Even in the case of emergent objects, 
the number of these objects can grow exponentially. For example, as a region is 
intersected by two curves, one horizontal and one vertical, we not only have four 
quadrant regions, NE, NW, SE and SW, but also regions that are composed of them, 
such NW+SW+SE and NE+NW. In addition, for each of the curves, we have a large 
number of segments as curve objects and emergent point objects as well. So our 
strategy has been to detect all first-order emergent objects, with other objects 
identified on an as-needed basis. By first-order, we mean emergent objects that do 
not have other emergent objects of the same type as subparts. In the example, we just 
considered, the four quadrant regions would be first-order emergent regions, and each 
of the curves will give rise to four first-order curve objects. 

Domain-specificity. Perceptions may be domain-specific because they are of interest 
only in some domains, e.g., “an L-shaped region.” They may also be domain-specific 
in that they combine pure spatial perception with domain-specific, but nonspatial, 
knowledge. For example, in a military application, a curve representing the motion of 
a unit towards a region might be interpreted as an attack, but that interpretation 
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involves combining domain-independent spatial perceptions - such as extending the 
line of motion and noting that it intersects with the region - with non-spatial domain 
knowledge - such as that the curve represents the motion of a military unit, that the 
region's identity is as a military target belonging to a side that is the enemy of the unit 
that is moving, etc. In our current implementation, it is the task of the problem solver 
to combine appropriately the domain-independent perceptions with domain-specific 
knowledge to arrive at such conclusions, but in application-dependent 
implementations of the architecture, some of these perceptions might be added to the 
set of perceptual routines. 

3.3 Action Routines 

The problem solving process may modify the diagram - create, destroy or modify 
objects. Typically, the task - the reverse of perception in some sense - involves 
creating the diagram such that the shapes of the objects in it satisfy a symbolically 
stated constraint, such as “add a curve from A to B that goes midway between regions 
R1 and R2,” and “modify the object Ol such that point P in it touches point Q in 
object 02.” Constructing or modifying diagrammatic elements that satisfy such 
constraints involves a set of Action Routines parallel to Perception Routines. Again 
similar to PRs, ARs can vary in generality. Deleting named objects that exist in the 
diagram, and adding objects with given spatial specifications, e.g., Add point at 
coordinate, Add curve <equation>, etc., are quite straightforward. Our ARs include 
translation and rotation of named objects for specified translation and rotation 
parameters. Because of the specific needs of our domain, we have ARs that find one 
or more representative paths from point A to point B, such that intersections with a 
given set of objects are avoided. A representative path has the right qualitative 
properties (avoid specific regions, e.g.), but is a representative of class of paths with 
those qualitative properties, members of which may differ in various quantitative 
dimensions, such as length. The set of ARs includes routines that adjust a path in this 
class to be shorter, longer, etc., in various ways. Extending lines indefinitely in 
certain directions so that a PR can decide if the extended line will intersect with an 
object of interest is one that we need in our domain. Other researchers have found 
specific sets of ARs that are useful for their tasks, such as the AR in [5], “Make a 
circle object that passes through points A, B, and C.” An AR that we haven't yet 
used, but we think would be especially valuable, is one that changes a region object 
into a point object and vice versa as the resolution level changes in problem solving, 
such as a city appearing as a point in a national map, while it appears as a region in a 
state map. 

Underspecification of spatial properties of objects. More generally, each of the PRs 
can be reversed and a corresponding AR imagined. For example, corresponding to 
the PR Inside(Rl, R2) is the AR, “Make region R2 such that lnside(Rl,R2),” 
(assuming region R1 exists); and corresponding to Length(curve Cl) is the AR, 
“Make curve Cl such that Length(Cl) < 5 units.” In most such instances, the spatial 
specification of the object being created or modified is radically underdefmed. 
Depending on the situation, random choices may be made, or certain rules about 
creating of objects can be followed. However, the problem solver needs to keep track 
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of the fact that the reasoning system is not committed to all the spatial specification 
details. For example, in the example in Fig. 1., the points are placed in response to 
the Left-of information regarding A, B and C. The problem solver needs to keep 
track of the fact that there is no commitment to the actual distances between the points 
in the Fig.. 




Fig. 3. The architecture showing the perceptual and the symbolic parts as coequal information 
sources 

4 The Architecture 

Fig. 3. is a schematic of the problem solving architecture that is the subject of current 
experimentation. The architecture is a functional diagrammatic problem solver even 
in the absence of an external diagram. This is because the perceptual routines work 
off DRS, and DRS may be constructed either from an external representation (by that 
part of visual perception that breaks the image array into objects, or manually, as in 
our experiments), from memory, or a combination. 

The problem solver sets up subgoals, and the representational system - the 
symbolic or the diagrammatic component - that can provide the solution responds. 
The problem solver may create or modify the diagram during problem solving. The 
modification is achieved by activating action routines that interact with the external 
representation, or it can simply change DRS - functionally corresponding to 
imagining changes. If a modification corresponds to a commitment to some part of a 
solution, say, a route in a route-planning problem, it would be appropriate to change 
the external representation; otherwise, such as when solutions are being hypothesized 
before being critiqued, a change to DRS is sufficient. These decisions are under the 
control of the problem solver. 

Fig. 3. doesn't bring this out explicitly, but the problem state is really bimodal. 
Using a route planning example for illustration, suppose a certain state corresponds 
to, say, “Road A being taken eastwards.” The symbolic component of this information 
- e.g., route - segment : (Road A, direction: East, 

Propertyl (A) : 2 -lane, Property2 (A) : toll, ... .) — is in the 
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Symbolic Information box of the architecture. The corresponding diagrammatic 
component, the curve object corresponding to Road A, and the other roads and 
intersection points will be in the DRS part. If the next subgoal is to calculate the cost 
of travel on Road A, the calculation will be done by the symbolic box by making use 
of additional information about costs from its background knowledge. On the other 
hand, if the next subgoal is to decide if point B is on the way on Road A going east, 
the DRS and the perceptual routines will be best positioned to solve this subgoal. 




Fig. 4. A scenario. Tl, T2 and T3 are tanks, and pi, etc. are the corresponding locations. R1 
represents a river, B 1 a bridge, S 1 is a sensor field and P 1 and P2 are paths 



A research goal is to see how far we can go in making the perceptual component 
co-equal with the symbolic component in problem solving. To take an example, there 
is no intrinsic reason why a problem solving goal has to be represented solely in 
symbolic terms, i.e., in terms of predicates that we wish to hold in the goal state, as in 
ON(A,B) in the Blocks world. In principle, goals may also be partly or wholly 
described in spatial terms, such as “Make an object whose shape is <this>,” where the 
referent of <this> is a shape, represented in DRS. Similarly, the output of a 
perceptual routine may be perceptual, such as when emergent objects are recognized. 
Or it may be symbolic, as when a relational routine returns a yes or no answer (such 
as to the question “Is point A to the left of region B?”), or a predicate (such as “A is 
inside B”). This information will be added to the store of symbolic information, and 
thus made available for supporting additional inferences. Conversely, the result of a 
symbolic rule application, e.g., “Attach component A to component B at point C,” 
might result in a DRS component being created representing the spatial configuration 
corresponding to this situation. This would make it possible for perceptual routines to 
be applied to the representation in DRS. 
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5 Application Example 

The US Army's All Source Analysis System or ASAS provides a picture of the enemy 
to its forces. The ASAS is continually updated by incoming intelligence reports such 
as sightings of enemy vehicles at different locations. Integrating incoming data into 
the system is done by human operators. One of the areas we are applying DRS to is in 
automating this process. In the example below, we intend to highlight the 
contributions that DRS makes rather than give the reader a comprehensive account of 
the problem solving process involved. 

The current situation is represented in Fig. 4. Tl, T2 and T3 are three tank 
sightings at locations pi, p2 and p3 at time instants tl, t2 and t3 respectively. Initially 
ASAS has reported sightings of the two tanks Tl and T2. Sometime later, a new 
report of a sighting of a tank at location p3 comes in. At this point, tank T3 could be 
the result of either Tl or T2 having moved to the new location or a completely new 
tank that was previously unreported. The system tries to eliminate some of these 
possibilities in order to narrow down the hypothesis space. It tries to construct a path 
from positions pi and p2 to p3 over which a tank could have traveled within the given 
time span. This process involves a complex interaction between the problem solver 
and the diagrammatic reasoner. 

Given the map, the emergent object recognition routine of the DRS identifies three 
subregions of the region object corresponding to the river. The problem solver (PS) 
retrieves from its knowledge base the information that a bridge is a region over which 
a river can be crossed. It marks the river sections as barriers, and the section covered 
by the bridge as crossable. These labels are domain-specific inferences made by the 
PS using domain knowledge. When the report of the new sighting comes in, the 
problem solver's task is to figure out whether T3 is either of Tl or T2 that has moved 
since their original sightings, or a completely new tank. PS calls the DRS function - 
one of the action routines - to construct representative paths (see Sec. 3.3), one from 
pi to p3 and the other from p2 to p3, that avoid the barrier regions. The PS calculates 
if the representative paths are short enough for the tank to have moved from their 
starting points to p3 within the given time span; if yes, the corresponding tank is 
accepted as a potential candidate. If the problem solver decides that the constructed 
path is too long, it can go back and ask the DRS to construct shorter paths, and the 
process repeated. If the distance covered by the shortest possible path is still too long 
to be covered in the time allotted, the PS discards the corresponding tank as a possible 
candidate. Let us look at another reasoning sequence. In the case of Tank 2, any path 
has to go over the bridge and hence over the sensor field at its foot. The problem 
solver asks the DRS whether the path from p2 to p3 intersects any regions labeled S 
(for sensor - DRS doesn't know anything about sensors, but the objects in DRS have 
labels to mediate the interactions between PS and DRS). When the DRS comes back 
and reports that the path goes over region SI, the PS goes back to the DRS asking it to 
find an alternative path from P2 to P3 (it can do this by simply setting the sensor field 
as a barrier). If no such alternative path exists, the PS can conclude that if T2 has 
moved to p3, then it should have gone over the sensor field and there should a 
corresponding report about it somewhere in the database. The absence of such a report 
while not completely ruling out the possibility of T3 being T2 (due to the possibility 
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of sensor failure), it does substantially decrease the possibility and in turn, increases 
the possibility that T3 is either T 1 or a new tank. 



6 Related Work 

This work is an evolution of earlier general proposals made in [6]. The following is a 
sampling of work in diagrammatic research on mixing the symbolic and perceptual 
modes. Geometry theorem proving has especially been a task that has seen much 
research in this area, with [5] and [7] being well-known recent examples. 
Diagrammatic and symbolic representations in device behavior understanding are 
considered in [8]. While there are many points of contact between the representation 
of the diagram in these systems and DRS, our overall goal is a generic architecture for 
bimodal problem solving in contrast to the above research whose proposals for 
primitive objects and perceptions are tailored to the needs of their domains. In 
contrast, we have little to say about the specific ways in which diagrams work in their 
domains, but our more general representations may be specialized to their needs. 
Reference [9] describes a program that recognizes domain-specific objects (such as 
tea kettles) from diagrams composed of straight line segments, and extracts spatial 
relations between the objects, but does not deal with problem solving in general. 
Pineda [10] treats diagrams as sentences constructed from diagrammatic primitives 
and focuses on the semantics of such sentences. In an informal sense, DRS is such a 
sentence, with the semantics of the symbols for diagrammatic objects being directly 
represented by its spatial specification. Heterogeneous representations from a logic 
perspective are considered in [11], and in database retrieval in [12], neither of which 
is concerned with general issues in problem state representation. In work on general 
cognitive architectures such as ACT-R [13] and Soar [1], there has been significant 
recent emphasis on systems that deal with perceptual input, but the degree of 
integration of perception that we are proposing has not been attempted in those 
research areas. 

7 Discussion 

Diagrammatic reasoning is not magic slayer of the computational dragon - it so 
happens that humans have perceptual systems that works in parallel with deliberation, 
so it makes sense for humans to make use of that computational capacity to reduce the 
burden on deliberation, which is slow and memory-limited. What interest is 
diagrammatic reasoning to AI, which is not necessarily restricted to the contingencies 
of the design of the human architecture? 

We propose three reasons. The first is that robotic systems would necessarily have, 
like humans, a perceptual system with a number of basic perceptual capacities built 
in. Thus, for robots solving problems, drawing an external diagram, or imagining one 
in its equivalent of DRS, and applying perceptual routines might offer similar benefit. 
Such systems may also find it useful to have the equivalent of visual memories of 
their experiences, in contrast to the current approach where the robot extracts some 
information about the world in a sentential form and stores it in it memory. DRS has 
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suggestive implications for the design of such memories. The second reason applies 
even for AI systems more limited in their intended capabilities than integrated robotic 
systems. It is true that for a specific simple problem such as that in Fig. 1, it would 
make no sense to build the general purpose machinery such as that described in this 
paper. However, there are AI applications that have to interact with human users, 
with the human and the machine sharing the problem solving responsibilities in a 
flexible way. Planning and situation understanding problems of this sort with a large 
spatial component are generic enough that investment in general perceptual reasoning 
would make sense. A third reason is that much human knowledge in various domains 
has diagrammatic components. Exploiting such knowledge would benefit from an 
ability to use diagrams in reasoning. 
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Abstract. An important aim of diagrammatic reasoning is to make it 
easier for people to create and understand logical arguments. We have 
worked on spider diagrams, which visually express logical statements. 
Ideally, automatically generated proofs should be short and easy to un- 
derstand. An existing proof generator for spider diagrams successfully 
writes proofs, but they can be long and unwieldy. In this paper, we 
present a new approach to proof writing in diagrammatic systems, which 
is guaranteed to find shortest proofs and can be extended to incorporate 
other readability criteria. We apply the A* algorithm and develop an 
admissible heuristic function to guide automatic proof construction. We 
demonstrate the effectiveness of the heuristic used. The work has been 
implemented as part of a spider diagram reasoning tool. 



1 Introduction 

In this paper, we show how readability can be taken into account when generating 
proofs in a spider diagram reasoning system. In particular, we show how the A* 
heuristic search algorithm can be applied. 

If diagrammatic reasoning is to be practical, then tool support is essential. 
Proof writing in diagrammatic systems without software support can be time- 
consuming and error prone. 

In [4] we present a proving environment that supports reasoning with spider 
diagrams. This proving environment incorporates automated proof construction: 
an algorithm generates a proof that one spider diagram entails another, provided 
such a proof exists. This algorithm usually produces long and somewhat unwieldy 
proofs. These unnatural proofs suffice if one only wishes to know that there exists 
a proof. However, if one wishes to read and understand a proof then applying 
the algorithm may not be the best approach to constructing a proof. 

Readability and understandability of proofs are important because they: 

— support education. The introduction of calculators has not stopped us teach- 
ing numeracy. Similarly, automatic proof generators will not stop us teaching 
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students how to construct proofs manually. An automated proof generator 
can support the learning process, but only if it generates proofs similar to 
those produced by expert humans [22]. 

— increase notational understanding. People will need to write and verify the 
specifications (premise and conclusion) given to the automated proof gener- 
ator. Specification is the phase where most mistakes can be introduced [11]. 
A good understanding of the notation used is therefore needed. Reading (and 
writing) proofs helps to develop that understanding. 

— provide trust in generated proofs. As discussed in [11], automated proof 
generators can themselves contain flaws 1 and many people remain wary of 
proofs they cannot themselves understand [22]. 

— encourage lateral thinking. Understanding a proof often leads to ideas of 
other theorems that could be proved. 

Research has been conducted on how to optimize the presentation of proofs, op- 
timizing the layout of proofs or annotating proofs with or even translating them 
into natural language (see for example [15], [ 3] , [1] ) . The diagrammatic reason- 
ing community attempts to make reasoning easier by using diagrams instead 
of formulas. For instance, we are working on presenting proofs as sequences of 
diagrams. 

In this paper, we enhance the proving environment presented in [4] by de- 
veloping a heuristic approach to theorem proving in the spider diagram system. 
We define numerical measures that indicate ‘how close’ one spider diagram is to 
another and these measures guide the tool towards rule applications that result 
in shorter and, therefore, hopefully more ‘natural’ proofs. In fact, the method 
is guaranteed to find a shortest proof, provided one exists. Note that heuristic 
approaches have been used in automated theorem provers before, but mainly to 
make it faster and less memory intensive to find a proof. This is a nice side effect 
of using heuristics, but our main reason is readability of the resulting proof. 
Often heuristics used are not numerical such as ‘in general, this rule should be 
applied before that rule’ (e.g. [12]). When numerical heuristics have been used 
(e.g. [5]), they have usually been used to find a proof quickly rather than to 
optimize the proofs. 

Many other diagrammatic reasoning systems have been developed, see [9] 
for an overview. For example, the DIAMOND system allows the construction 
of diagrammatic proofs, learning from user examples. The kinds of diagrams 
considered are quite different to spider diagrams. 

In [18] Swoboda proposes an approach towards implementing an Euler/ Venn 
reasoning system, using directed acyclic graphs (DAGs). The use of DAGs to 
compare diagrams was the focus of [19] in which two diagrams are compared 
to assess the correctness of a single diagram transformation. It is possible that 
the work on DAGs, if extended to assess a sequence of diagram transformations, 
could show some similarity to the measures we give in section 4. The existing 

1 If an algorithm has been proved correct, still the correctness proof or the implemen- 
tation could contain errors. 
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Fig. 1 . A Venn diagram and a spider diagram 



DAGs work does not create proofs but rather checks the validity of a proof 
candidate. 

The structure of this paper is as follows. In Section 2, we introduce a simpli- 
fied (unitary diagrams only) version of our spider diagram reasoning system. In 
Section 3, we discuss the A* heuristic search algorithm that we have applied to 
this reasoning system. In Section 4, we describe a so-called admissible heuristic 
function that measures the difference between two spider diagrams. In Section 5, 
we briefly discuss how the heuristic diagrammatic proof generator has been im- 
plemented, and evaluate its results. We conclude by discussing how the cost 
element of A* can be used to further enhance understandability (the shortest 
proof is not necessarily the easiest to understand) , how the heuristic can be used 
to support interactive proof writing, and what we expect from extending this 
approach to non-unitary spider diagrams and so-called constraint diagrams. 

2 Unitary Spider Diagrams 

In this section, we briefly and informally introduce our diagrammatic reasoning 
system. For reasons of clarity, we restrict ourselves in this paper to so-called 
unitary spider diagrams and the reasoning rules associated with those. We will 
briefly return to non-unitary diagrams in Section 6. 

Simple diagrammatic systems that inspired spider diagrams are Venn and 
Euler diagrams. In Venn diagrams all possible intersections between contours 
must occur and shading is used to represent the empty set. Diagram di in Fig. 1 
is a Venn diagram. Venn-Peirce diagrams [14] extend the Venn diagram nota- 
tion, using additional syntax to represent non-empty sets. Euler diagrams exploit 
topological properties of enclosure, exclusion and intersection to represent sub- 
sets, disjoint sets and set intersection respectively. Spider diagrams are based 
on Euler diagrams. Spiders are used to represent the existence of elements and 
shading is used to place upper bounds on the cardinalities of sets. A spider is 
drawn as a collection of dots (the feet ) joined by lines. The diagram e ?2 in Fig. 1 is 
a spider diagram. Sound and complete reasoning rules for various spider diagram 
systems have been given, e.g. [4]. 

2.1 Syntax and Semantics of Unitary Spider Diagrams 

In this section, we will give an informal description of the syntax and semantics 
of unitary spider diagrams. Details and a formal description can be found at [17]. 
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— A contour is a labelled shape in the diagram used to denote a set. 

— A boundary rectangle is an unlabelled rectangle that bounds the diagram 
and denotes the universal set. 

— A zone, roughly speaking, is a bounded area in the diagram having no other 
bounded area contained within it. More precisely, a zone can be described 
by the set of labels of the contours that contain it (the containing label set) 
and the set of labels of the contours that exclude it (the excluding label set). 
A zone denotes a set by intersection and subtraction of the sets denoted by 
the contours that contain it and the contours that exclude it respectively. 

— A spider is a tree with nodes, called feet, placed in different zones. A spider 
touches a zone if one of its feet appears in that zone. The set of zones a spider 
touches is called its habitat. A spider denotes the existence of an element 
in the set represented by its habitat. Distinct spiders represent the existence 
of distinct elements. 

— A zone can be shaded. In the set represented by a shaded zone, all of the 
elements are represented by spiders. So, a shaded zone with no spiders in it 
represents the empty set. 

— A unitary diagram is a finite collection of contours (with distinct labels), 
shading and spiders properly contained by a boundary rectangle. 

The unitary diagram g ?2 in Fig. 1 contains three labelled contours and five 
zones, of which one is shaded. There are two spiders. The spider with one foot 
inhabits the zone inside (the contour labelled) Cats , but outside Dogs and Mice. 
The other spider inhabits the zone set which consists of the zone inside Mice and 
the zone inside Dogs but outside Cats. The diagram expresses the statement 
“no mice are cats or dogs, no dogs are cats, there is a cat and there is something 
that is either a mouse or a dog”. A semantically equivalent diagram could be 
drawn which presents the contours Dogs and Cats as disjoint. 

The semantics for spider diagrams is model-based. A model assigns sets to 
diagram contours, and zones to appropriate intersections of sets (and their com- 
plements) . Zones which are absent from a diagram (for example if two contours 
are drawn disjoint) correspond to empty sets. Spiders assert the existence of (dis- 
tinct) elements in the sets and shading (with spiders) asserts an upper bound 
on the cardinality of the sets. A zone which is shaded but untouched by spiders 
corresponds to the empty set in any model, but a zone which has a single- footed 
spider and is shaded corresponds to a set with exactly one element. A full de- 
scription of the semantics can be found at [17]. 

2.2 Reasoning Rules 

In this section we will give informal descriptions of the reasoning rules for unitary 
spider diagrams. Each rule is expressed as a transformation of one unitary spider 
diagram into another. Formal descriptions can be found at [21]. 

Rule 1 Add contour. A new contour can be added. Each zone is split into two 
zones (one inside and one outside the new contour) and shading is preserved. 
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d x d 2 d 3 

Fig. 2. Applications of add contour and delete shaded zone 



Each foot of a spider is replaced by a connected pair of feet, one in each of the 
two new zones. For example, in Fig. 2, diagram g ?2 is obtained from d\ by adding 
the contour labelled B. 

Rule 2 Delete contour. A contour can be deleted. If, as a result, a spider has 
two feet in the same zone, these feet are contracted into a single foot. 

Rule 3 Add shaded zone. A new, shaded zone can be added. 

Rule 4 Delete shaded zone. A shaded zone that is not part of the habitat of 
any spider can be deleted. For example, in Fig. 2, diagram ds is obtained from d 2 
by deleting a shaded zone. 

Rule 5 Erase shading. Shading can be erased from any zone. 

Rule 6 Delete spider. A spider whose habitat is completely non-shaded can 
be deleted. 

Rule 7 Add spider foot. A spider foot can be added to a spider in a zone it 
does not yet touch. 

With the semantics as sketched in the previous section, each of these rules 
has been proven to be sound. This means that an application of any rule to 
a diagram yields a second diagram representing a semantic consequence of the 
first diagram. A sequence of diagrams and rule applications also gives a semantic 
consequence, and in this logic system, is a proof. 

Let d\ and c ?2 be diagrams. We say c ?2 is obtainable from d\, denoted d\ b d 2 , 
if and only if there is a sequence of diagrams (d 1 , d 2 , ..., d m ) such that d 1 = 
di, d m = c ?2 and, for each k where 1 < k < m, d k can be transformed into d k+1 
by a single application of one of the reasoning rules. Such a sequence of diagrams 
is called a proof from premise d\ to conclusion e? 2 . 

3 A* applied to Proof Writing 

To construct a proof, a rule needs to be applied to the premise diagram, followed 
by another rule to the resulting diagram, and so on, until the conclusion dia- 
gram is reached. At any stage in the proof, multiple rules might be applicable. 
For instance, in Figure 2, many rules can be applied to diagram d\, such as Add 
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Contour B, Delete Contour A, Delete Spider, Erase Shading, and Add Spider 
Foot. Not all rules applications help to find a shortest proof (e.g., only apply- 
ing Add Contour B to d\ would help to find a shortest proof to cfe), and some 
rule applications might even make it impossible to find a proof (e.g., applying 
Delete Spider to di results in a diagram from which d% can no longer be proven) . 
Human proof writers might intelligently choose the next rule to apply, in order 
to reach the conclusion diagram as quickly as possible. The problem of decid- 
ing which rules to apply to find a proof is an example of a more general class 
of so-called search problems, for which various algorithms have been developed 
(see [10] for an overview). Some of these algorithms, so-called blind-search algo- 
rithms, systematically try all possible actions. Others, so-called best-first search 
algorithms, have been made more intelligent, and attempt (like humans) to in- 
telligently choose which action to try first. A* is a well known best-first search 
algorithm [6]. 

A* stores an ordered sequence of proof attempts. Initially, this sequence only 
contains a zero length proof attempt, namely the premise diagram. Repeat- 
edly, A* removes the first proof attempt from the sequence and considers it. If 
the last diagram of the proof attempt is the conclusion diagram, then an optimal 
proof has been found. Otherwise, it constructs additional proof attempts, by ex- 
tending the proof attempt under consideration, applying rules wherever possible 
to the last diagram. 

The effectiveness of A* and the definition of “optimal” is dependent upon the 
ordering imposed on the proof attempt sequence. The ordering is derived from 
the sum of two functions. One function, called the heuristic, estimates how far 
the last diagram in the proof attempt is from the conclusion diagram. The other, 
called the cost, calculates how costly it has been to reach the last diagram from 
the premise diagram. We define the cost of applying a reasoning rule to be one. 
So, the cost is the number of reasoning rules that have been applied to get from 
the premise diagram to the last diagram (i.e. the length of the proof attempt). 
The new proof attempts are inserted into the sequence, ordered according to the 
cost plus heuristic. 

A* has been proven to be complete and optimal, i.e. always finding the best 
solution (because all reasoning rules have cost 1, this means the shortest proof), if 
one exists, provided the heuristic used is admissible [2]. A heuristic is admissible 
if it is optimistic, which means that it never overestimates the cost of getting 
from a premise diagram to a conclusion diagram. As all reasoning rules have 
cost equal to one, this means that the heuristic should give a lower bound on 
the number of proof steps needed in order to reach the conclusion diagram. 

The amount of memory and time needed by A* depends heavily on the quality 
of the heuristic used. For instance, a heuristic that is the constant function zero 
is admissible, but will result in long and memory-intensive searches. The better 
the heuristic (in the sense of accurately predicting the length of a shortest proof), 
the less memory and time are needed for the search. In this paper, we present 
a highly effective heuristic for A* applied to proof writing in a unitary spider 
diagram reasoning system. 
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4 Heuristic Function for Unitary Diagrams 

As discussed above, in order to apply A* we need a heuristic function that 
gives an optimistic (admissible) estimate of how many proof steps it would take 
to get from a premise diagram to the conclusion diagram. In this section, we 
develop various measures between two unitary diagrams. The overall heuristic 
we use is built from these measures. The heuristic gives a lower bound on the 
number of proof steps required and is used to help choose rule applications when 
constructing proofs in our implementation. 

There are four types of differences that can be exhibited between two uni- 
tary diagrams: between the contour labels, the zones, the shaded zones and the 
spiders. 




Venn(ContourForm (d { )) Venn(d 2 ) 

AddShading (d { , d 2 ) = 0 
RemShading (d lt d 2 ) — 1 
RemSpiders (d v d 2 ) — 1 
ChangedSpiders (d v d 2 ) — 0 



UH(d u d 2 ) = 2+ 1 + 0+ 1 +min(0+ 1 + 0 , 2 ) = 5 




Fig. 3. Calculating the heuristic: example 1 
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4.1 The Contour Difference Measure 

The only rules that alter a unitary diagram’s contour set are Add Contour and 
Delete Contour, which respectively add and delete one contour at a time. Thus 
we define the contour difference measure between diagrams d\ and d 2 to be 
the size of the symmetric difference of the label sets Cont(di) (the set of labels 
of the contours of d\) and Contfa): 

CDiff(di, d 2 ) = \Cont(d 2 ) — Cont{d\)\ + \Cont{d\) — Cont{d 2 )\ 

The contour difference between d\ and c ?2 in Figure 3 is given by 

CDiff(d u d 2 ) = \{A, C} - { B , C}| + | {B, C} - {A, C}\ = 2 
whereas the contour sets in Figure 4 are equal, so CDif f{d\, d 2 ) = 0. 
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Venn(ContourForm(d x )) Venn(d 2 ) 

AddShading (d l d 2 ) = 0 
RemShading (d x ,d 2 ) = 1 
RemSpiders (d x ,d 2 ) = 0 
ChangedSpiders (d x ,d 2 ) = 1 



UH{d x ,d 2 ) = 0+0+0+0+m m( 1 + 1 + 1 ,2) = 2 
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Fig. 4. Calculating the heuristic: example 2 
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Fig. 5. Examples to justify capping measures at one 



4.2 The Zone Difference Measures 



A diagram’s zone set can be altered by the rules Add Contour, Delete Contour, 
Add Shaded Zone, and Delete Shaded Zone. The calculation of CDiff identified 
contour differences between d\ and cfe, and Add and Delete Contour rules would 
have to appear in any proof from d\ to c?2 to fix these differences. We need 
to determine whether these rules are sufficient to account for changes in the 
zone sets, or whether other applications of rules to add or delete shaded zones 
are needed. In order to calculate the zone difference measures for the heuristic, 
take d\ and apply the relevant Delete Contour and Add Contour rules to make 
a new diagram, ContourForm{d\ 1 d2) which has the same contour set as o?2 • If 
the zone sets of ContourForm(d\, cO and don’t match, we may also need to 
use Add Shaded Zone and/or Delete Shaded Zone in any proof from d\ to c?2- 
Two zones are deemed equal if they have the same set of containing con- 
tour labels and the same set of excluding contour labels. Define the two zone 
difference measures between diagrams d\ and cl 2 to be 



AddZone(di , c/2) 



1 if Zones(d2 ) $£ Zones(ContourForm(di, c^)) 
0 otherwise 



RemZ one(d ± , cfe) 



1 if Zones(ContourForm(di 1 c^)) Zones{d 2) 

0 otherwise. 



The sum CDif f(d\, c/2) + AddZone(d\, CZ2) + RemZ one{d\ , c?2) is an optimistic 
estimate of the number of applications of Add Contour, Delete Contour, Add 
Shaded Zone and Delete Shaded Zone which are required in a proof from d\ 
to d2- 

In Figure 3, the diagram Contour Form{d\) has a zone not present in (the 
zone in both contours B and C), so RemZone{d\, ^2) = 1. All zones in c?2 are 
present in Contour Form(di), so AddZone{d\,d2) = 0. The sum of measures 
is three, and this corresponds to three rule applications: Remove Contour (A), 
Add Contour ( B ) and Delete Shaded Zone (intersection of B and C) which 
have to be present in any proof from d\ to c?2. In Figure 4, diagram e?2 has two 
zones not in Contour Form{d\)\ the zone in A and C and the zone in B and C. 
Although there are two extra zones, we still find that AddZone{d\ , ^2) = 1- The 
sum above, in this case, is0+l + 0 = l and any proof from d\ to g? 2 indeed 
requires at least one step. 

From these examples alone, the limiting of AddZone and RemZone to a max- 
imum value of one may seem counterintuitive. In Figure 5, diagrams d\ and c?2 
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have CDiff(di,d,2) = 1 and AddZone(d\,d2) = 1. Because there are two zones 
in g? 2 which are not present in ContourForm(di, c^)) one might conclude that 
two applications of Add Shaded Zone would be required to transform d\ into d2, 
and a suitable AddZone measure would be two. However, we can add just one 
shaded zone to di before adding the contour C . For this reason, the AddZone 
measure is capped at a value of 1, even where ContourForm(di,d2 ) has more 
than one zone that is not in d\. Diagrams d% and d± in Figure 5 similarly justify 
the capping of the RemZone measure at one. 

4.3 The Shading Difference Measures 

So far, we have captured differences between contour sets and zone sets. We also 
want a way to compare the shading between diagrams. The shading can not only 
be affected by the rule Erase Shading, but also by any of the rules required to 
make the contour sets and the zone sets the same. 

To measure the difference in shading between d\ and c?2 , we take the di- 
agram ContourForm{d 1,^2) and add shaded zones until the diagram is in 
Venn form (every possible zone is present, given the contour label set), giv- 
ing Venn(ContourForm(di 1 d2))- We also add shaded zones to c?2 until cfe is 
in Venn form, giving V e:nn(d/>) . The shading difference measures between 
diagrams d\ and g? 2 are defined to be 



The allocation of 00 as the AddShading measure indicates that there is no 
proof from d\ to g? 2- In the examples shown in Figures 3 and 4, a proof does 
exist between d\ and g^, and in both cases AddS hading (di,^) = 0. Also, in 
both figures, Venn(ContourForm(di, g^)) has more shading than Venn(d2), so 
RemS hading (di, c^) = 1. The value of RemShading is capped at 1 for similar 
reasons to the capping of AddZone and RemZone. 

The sum CDiff + AddZone + RemZone + AddShading + RemShading is 
an optimistic estimate of the number of applications of Add Contour, Delete 
Contour, Add Shaded Zone, Delete Shaded Zone, and Erase Shading which are 
required in a proof from d\ to c?2 (i.e. the sum is a lower bound on the number 
of proof steps required). 

4.4 The Spider Difference Measures 

The rules Delete Spider and Add Spider Foot change the number of spiders in 
a diagram and the habitats of spiders, respectively. In addition, deleting and 
reintroducing a contour can also affect the habitats of spiders. 





^ S hadedZ ones(V enn(C ontourForm(di , cfe))) 
0 otherwise 



1 if ShadedZ ones(V enn(C ontourForm(di , cfe))) 



RemShading(di, d2) = < S hadedZ ones(Venn(d2)) 

I 0 otherwise. 
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We can see that no rules introduce spiders, so if c?2 has more spiders than d\ 
then we define the spider difference measure to be infinite (in effect, blocking the 
search for proofs between such diagrams). If we have fewer spiders in g? 2 than d\, 
then the rule Delete Spider must have been applied and the difference between 
the numbers of spiders contributes to the heuristic. 

If a spider in g? 2 has a different habitat to all spiders in ContourForm{d\, c^) 
(i.e. it’s unmatched in d\) then some rule must be applied to change the habitat 
of a spider in di to obtain the spider in c?2. That rule could be Add Spider 
Foot. Alternatively, the deletion and reintroduction of a contour can change 
many spiders’ habitats. We say that n spiders in c ?2 are unmatched in d\ if 
the bag subtraction Bag(habitats of spiders in di) — Bag(habitats of spiders in 
ContourForm{d\,d2 )) has n elements. Define 



RemSpiders(di , ^2) 



00 if N umSpider s(d\) — N umSpider s(d2) < 0 

N umSpider s{d\) — Num Spiders^) otherwise 



ChangedSpiders(d±, di) = Number of spiders in unmatched in d±. 

In Figure 3, there is one spider in d\ and none in c?2, so one application of 
the Delete Spider rule is required in any proof from d\ to g? 2- The measure 
RemSpiders(di, 0 ^ 2 ) = 1 , but there are no unmatched spiders in g? 2 , so 
ChangedSpiders(di, 0^2) = 0. In Figure 4, d\ and g^ have the same number of 
spiders, so RemSpiders(d\, di) = 0. However, the spider in ^2 is unmatched 
in d\ so ChangedSpiders(di, d2) = 1. 



4.5 The Unitary Diagram Heuristic 

We have built seven measures by considering possible differences between the 
premise and conclusion diagrams. To give us a lower bound on the length of 
a proof with premise d\ and conclusion g? 2 we take the sum of the measures 
described above, but limit the contributions from AddZone, RemShading and 
ChangedSpiders to 2. This is because zones, shading and spiders can change 
by applications of Delete Contour followed by Add Contour, as illustrated in 
Figure 4. Unless we cap the heuristic as shown, it will fail to be admissible, 
as required by the A* algorithm. Define the unitary diagram heuristic be- 
tween d\ and d% to be the sum 

UFd(di 1 d2) = CDif f{d\,di) + RemZone{d\,d2) 

+ AddS hading(d \ , di) + RemSpiders{d \ , ^2) 

{ AddZone(d±, ^2) + RemShading{d \ , c?2) 
+ChangedSpiders{d\,di) 

2 . 



Lemma 1. Let d\ and g?2 he unitary diagrams. IfUH{d\, di) = 00 then d\ \f c?2 - 

Theorem 1. Let d\ and c?2 he unitary diagrams. If there is a proof with 
premise di and conclusion c?2 then that proof has length at least UH{d\,di). 
That is, the unitary diagram heuristic is optimistic. 
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The proof strategy uses induction on lengths of proofs. Given a shortest proof 
length n between d\ and ^ 2 , let d nex t be the second diagram in the proof. 
The proof from d nex t to c ?2 is length n — 1 and so by induction, n — 1 > 
U H (d next , cfe). We consider the seven rules in turn which could have been applied 
to d\ to obtain d nex t ■ In each case, we find a relationship between UH(d\,d 2 ) 
and UH(d nex t,d 2 ) which allows us to deduce that n > UH(d\,d 2 ). 

5 Implementation and Evaluation 

The heuristic approach to proof- finding has been implemented in java as part 
of a proving tool (available at [20]). The user can provide diagrams and ask the 
prover to seek a proof from one diagram to another. The tool allows the user to 
give a restriction on the rule-set used. Moreover, in order to assess the benefits 
gained from the heuristic defined in section 4, the user can choose between 
the zero heuristic ( ZH ) and the unitary diagram heuristic (as defined above). 
The zero heuristic simply gives ZH(d\,d 2 ) = 0, for any diagrams d\ and d 2 - 
The A* algorithm, when implemented with the zero heuristic, simply performs 
an inefficient breadth-first search of the space of all possible proof attempts 
(given that we have assumed the cost of each rule application to be 1). 

Both heuristics succeed in finding proofs - the zero heuristic taking longer 
than the unitary diagram heuristic. The application records how many proof at- 
tempts were made during the search. This number can be thought of as a mem- 
ory and time burden. The savings made using our heuristic, over using the zero 
heuristic, can be seen by comparing the number of proof attempts. In an extreme 
case, during the data collection described below, the zero heuristic required 70795 
proof attempts while our heuristic only required 543 proof attempts. 

We generated a random sample (size n = 2400) of pairs of unitary dia- 
grams, di and g? 2 , for which d\ b c? 2 - These diagrams had at most three con- 
tours, two spiders and had a shortest proof length of four steps between them 
(these choices were arbitrary - similar results can be obtained by using differ- 
ent data sets). For each pair, we recorded the number of proof attempts each 
heuristic took to find a shortest proof from d\ to c ?2 ■ Since we are interested in 
the proportional saving, we calculated the ratio ^ where ni is the number of 
proof attempts for the unitary diagram heuristic and ri 2 is the number of proof 
attempts for the zero heuristic. A histogram showing the ratios obtained and 
their frequencies can be seen in Figure 6, as can a scatter plot of the raw data. 

We found that the unitary diagram heuristic takes, on average, less than 4% 
of the number of proof attempts that the zero heuristic takes. 

The A* search algorithm was implemented in two ways - one stopping con- 
dition guarantees that one shortest proof was found (as discussed above), and 
a stronger stopping condition guarantees that all shortest proofs were found. 
This second stopping condition was implemented so that the data collected was 
not affected by the order in which rules were applied in the search process. Even 
though it’s slower, the collection of all shortest proofs (instead of just one proof) 
could be of value in terms of maximizing readability of the outcome. Using the 
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Fig. 6. Histogram showing the frequencies for each ratio and a scatter plot 



stronger stopping condition, the unitary heuristic takes, on average, less than 
1.5% of the number of proof attempts that the zero heuristic takes. 

6 Conclusion and Further Work 

In this paper, we have demonstrated how a heuristic A* approach can be used 
to automatically generate optimal proofs in a unitary spider diagram reasoning 
system. We regard this as an important step towards generating readable proofs. 
However, our work has been limited in a number of ways. 

The first limitation is that we have assumed the cost of applying each rule to 
be equal. This results in the optimal proofs being found by A* being the short- 
est proofs. However, as already indicated in the introduction, the conciseness of 
a proof does not have to be synonymous with its readability and understand- 
ability. The cost element of the evaluation function can be altered to incorporate 
other factors that impact readability. For instance: 

— Comprehension of rules. There may be a difference in how difficult each rule 
is to understand. This might depend on the number of side effects of a rule. 
For instance, Delete Spider only deletes a spider without any side effects, 
while Add Contour does not only add a new contour, but can also add new 
feet to existing spiders. This might make an Add Contour application more 
difficult to understand. In a training situation, a student might already be 
very familiar with some rules but still new to other rules. We can model 
a difference in the relative difficulty of rules by assigning different costs. As 
long as we keep the minimum cost equal to 1, this would not impact the 
admissibility of the heuristic. 

— Drawability of diagrams. As discussed in [7] not all diagrams are drawable, 
subject to some well- formed conditions. These conditions were chosen to in- 
crease the usability of diagrams, for instance, the diagram with two contours 
which are super-imposed could be hard for a user to interpret, and there is no 
way of drawing that diagram without concurrent contours, or changing the 
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underlying zone set. Ultimately, we want to present a proof as a sequence of 
well-formed diagrams with rule applications between them. A proof would be 
less readable if an intermediate diagram could not be drawn. We can model 
this by increasing the cost of a rule application if the resulting diagram is 
not drawable. 

Empirical research is needed to determine which other factors might impact 
readability, what the relative under standability of the rules is and to what extent 
that is person dependent, and how to deal with non-drawable diagrams. 

The second limitation is that we have restricted ourselves to discussing the 
case of unitary spider diagrams. To enhance the practical usefulness of this 
work, we will need to extend this first to so-called compound diagrams, and 
next to the much more expressive constraint diagrams. Compound diagrams are 
built from unitary diagrams, using connectives U and n to present disjunctive 
and conjunctive information. If D\ and D 2 are spider diagrams then so are 
D 1 UD 2 (“Hi or H 2 ”) and Di\lD 2 (“Hi and H 2 ”). In addition to the reasoning 
rules discussed in Section 2.2 which operate on the unitary components, many 
reasoning rules exist that change the structure of a compound diagram [21]. 
We have started to investigate heuristic measures for compound diagrams. Our 
implementation (available on [20]) is capable of generating proofs for compound 
diagrams, using either the zero heuristic, or a simple heuristic that is derived 
only from the number of unitary components of the diagrams. Two rules were 
excluded, namely Hi h Hi U H 2 , and False b D\. We have defined modified 
versions of the unitary measures, which we expect to be able to integrate into 
an effective heuristic for compound diagrams, and we are exploring the use of 
additional structural information. 

Constraint diagrams are based on spider diagrams and include further syn- 
tactic elements, such as universal spiders and arrows. Universal spiders represent 
universal quantification (in contrast, spiders in spider diagrams represent exis- 
tential quantification). Arrows denote relational navigation. In [3] the authors 
give a reading algorithm for constraint diagrams. A constraint diagram reason- 
ing system (with restricted syntax and semantics) has been introduced in [16]. 
Since the constraint diagram notation extends the spider diagram notation, this 
is a significant step towards the development of a heuristic proof writing tool 
for constraint diagrams. In [16], the authors show that the constraint diagram 
system they call CD1 is decidable. However there are more expressive versions of 
the constraint diagram notation that may or may not yield decidable systems. If 
a constraint diagram system does not yield a decidable system then a heuristic 
approach to theorem proving will be vital if we are to automate the reasoning 
process. 

In addition to its use in automatic proof generation, our heuristic measure 
can also be used to support interactive proof writing (e.g. in an educational 
setting). It can advise the user on the probable implications of applying a rule 
(e.g. “If you remove this spider, then you will not be able to find a proof any 
more”, or “Adding contour B will decrease the contour difference measure, so 
might be a good idea”). Possible applications of rules could be annotated with 
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their impact on the heuristic value. The user could collaborate with the proof 

generator to solve complex problems. More research is needed to investigate how 

useful our heuristic measures are in an interactive setting. 
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Abstract. Complex systems are characterized by components that may 
have to be described using different notations. For the analysis of such 
a system, our approach is to transform each component (preserving be- 
haviour) into a single common formalism with appropriate analysis meth- 
ods. Both source and target notations are described by means of meta- 
modelling whereas the translation is modelled by means of graph trans- 
formation. During the transformation process, the intermediate models 
can be a blend of elements of the source and target notations, but at 
the end the resulting model should be expressed in the target notation 
alone. In this work we propose defining also a meta-model for the in- 
termediate process, in such a way that we can apply some validation 
methods to the transformation. In particular, we show how to prove 
functional behaviour (confluence and termination) via critical pair anal- 
ysis and layering conditions, and discuss other desirable properties of 
the transformation, namely: syntactic consistency and behaviour preser- 
vation. The automation of these concepts has been carried out by com- 
bining the AToM 3 and AGG tools. 

Keywords: Meta-Modelling, Graph Transformation, Multi-Formalism 
Modelling. 



1 Introduction 

The motivation for this work is the modelling, analysis and simulation of multi- 
formalism systems. These have components that may have to be described using 
different notations, due to their different characteristics. For the analysis of cer- 
tain properties of the whole system, or for its simulation, we transform each 
component into a common single formalism, where appropriate analysis or sim- 
ulation techniques are available. The formalism transformation graph (FTG) [4] 
may help in finding such formalism. It depicts a small part of the “formalism 
space”, in which formalisms are shown as nodes in the graph and the arrows 
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between them denote a homomorphic relationship “can be mapped onto”, using 
symbolic transformations between formalisms. These transformations may lead 
to a loss of information. Nonetheless, by performing the transformation we are 
able to solve questions that were harder or impossible to answer in the source 
formalism. Other arrows depict simulation and optimization (arrows starting 
and arriving to the same formalism) . Our approach is to define the syntax of the 
formalisms via meta-modelling, and to formally express transformations between 
formalisms by means of graph transformation. That is, starting with a graph G\ 
in formalism F \ , applying a formalism transformation we obtain a graph G 2 ex- 
pressed in formalism . Some work regarding model transformation in the con- 
text of multi- formalism modelling and simulation has already been done [5] [4], 
where not only formalism transformation, but simulation and optimization ar- 
rows have been modelled as graph transformation rules with AT 0 M 3 [4]. 

A similar situation arises with object oriented systems described with the 
UML, where different views of the system are described with different diagrams. 
For the analysis of such a multi-formalism system, the different diagrams can be 
translated into a common semantic domain [11, 12, 14]. 

The main contribution of the present work is proposing the creation of meta- 
models also for the intermediate models during the transformation (transforma- 
tion arrows in the FTG), and showing fundamental properties of the transforma- 
tion process. We prove confluence (proving that the result of the transformation 
is deterministic), termination (by modifying the layering conditions proposed 
in [3]), consistency (the resulting models should be valid instances of the target 
formalism meta-model) and behavioural equivalence of source and target mod- 
els. In addition, a prototype for the automation of the whole process has been 
realized by combining the AT 0 M 3 [4] and AGG [21] tools. We believe this work 
can also be very valuable for the Model Driven Architecture (MDA) [15], where 
model transformation plays a central role. 

Although graph transformation has been widely used for expressing model 
transformations, there are not many attempts to formally validate the trans- 
formation process. For example, in [1] an environment is presented to define 
Function Block Diagrams models. They can be executed by transforming them 
into High Level Timed Petri Nets using graph grammar rules but no proofs are 
given for the properties of the transformation. In [12] critical pair analysis was 
used to prove confluence of a transformation from Statecharts into CSP (a tex- 
tual formalism). The present work is a step further, as here the target formalism 
is also graphical, we provide a meta-model for the intermediate models arising 
during the transformation and prove further transformation properties. 



2 A Meta-Model for the Transformation Process 

During the process of transforming a model from formalism Tj into formalism 
F 2 , the intermediate models can have elements of both formalisms, as well as ex- 
tra elements. These are typically new links between elements of both formalisms, 
and auxiliary entities. In a correct formalism transformation, at the end of the 
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Fig. 1 . Typing in the Formalism Transformation Process 



transformation process the resulting model should be expressed in formalism Fj 
alone. If we formalize the syntax of both source and target formalisms using type 
graphs (a concept similar to a meta-model, see below), this can be expressed as: 

TGf x e — 1 Gi tr ^^* Q 2 f 2 . That is, there is a typing morphism typeF 1 

from Gi to the type graph of formalism Fi ( TGf 1 )■ At the end of the transfor- 
mation (expressed as the set of production rules tra,Fi2 = {pi, ■ ■■,Pn}), there is 
a typing morphism typeF 2 from G 2 to the type graph of formalism F2 ( TGf 2 )■ 

In order to formalize the kind of manipulations that can be performed during 
the process, a type graph TGf 12 should be defined for the intermediate graphs 
derived during the transformation process. Both TGf x and TGf 2 should have 
injective morphisms to TGf 12 - By enforcing these morphisms, we make sure that 
if graph Gi is typed over TGf 1 it is also typed over TGf 12 , and the same for 
graph G 2 . These relations are shown in the diagram in Figure 1 . 

For a graph Gi typed on TGf i: the typing should be the same on typeF 12 , 
that is .Fj o type f x = type f 12 • A similar situation happens with a graph G 2 typed 
on TGf 2 (F 2 o typeF 2 = typ^F 12 -) Usually, the type graph TGf 12 is the disjoint 
union of TGf 1 and TGf 2 , with the possible addition of auxiliary nodes and 
connections that may be used during the transformation process. 

In practice, in order to define a formalism, one sets additional constraints 
on type graphs. Typically, these include constraints on the multiplicities of re- 
lationships and additional ones expressed either textually (in the form of OCL 
formulae for example) or graphically (as visual OCL constraints [ 2 ] for example). 
Throughout the paper we refer to the type graph with additional constraints as 
“meta-model”. Thus, the meta-model for formalism F12 is formed by a type 
graph TGf 12 as described before, together with the constraints of both source 
and target meta- models, plus additional constraints on the new elements in 
TGf 12 — ( TGfx U TGf 2 )- For an easier application of this approach we allow 
the constraints coming from formalism Fj to be relaxed in meta-model F12. If 
a model Mi meets the constraints of the meta-model for Fj, it will meet the 
constraints of meta-model for formalism F12 , as they are less restrictive. On the 
contrary, we cannot relax the constraints coming from the meta-model of for- 
malism F 2 as then we cannot be sure that at the end of the transformation the 
resulting model will meet the constraints of formalism Fj. We give an example 
of these concepts in the following section. 
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Fig. 2. A process interaction model 



3 An Example: From Process Interaction to TTPN 

Suppose we want to transform a certain discrete event formalism [8] (in the 
Process-Interaction style [10]) into Timed Transition Petri Nets (TTPN) [17], 
as the latter formalism has analysis methods not usually available for Process- 
Interaction , such as the ones based on reachability graphs or structural analysis. 
The Process Interaction notation we consider here is specially tailored to the 
manufacturing domain. Models are composed of a number of machines inter- 
connected via queues in which pieces can be stored. Machines can be in state 
Idle or Busy and take a number of time steps (attribute Tproc ) to process each 
piece. They are also provided with attribute Tend which signals the time at 
which they will finish the processing of the current piece. Pieces are produced by 
generators at certain rates (given by the Inter Arrival Time IAT). Generators 
are also provided with an additional attribute ( Tnext ) which signals the time of 
the next piece generation. A unique global timer controls the current simulation 
time ( Time attribute) and the final time ( FinalTime attribute). Figure 2 shows 
an example, in which a machine can produce defective pieces that are then pro- 
cessed by a second machine and then sent back to the first machine. A generator 
produces pieces each 12 time steps and stores them in the queue named “Input”. 

The meta-model for this formalism is shown to the left of Figure 3, enclosed in 
the dotted rectangle labelled as “Process Interaction Meta-Model”. The direction 
of connections is shown as small arrows near the connection name. In the Timer 
class, we have indicated between brackets the valid multiplicites of this kind of 
elements. 

Timed Transition Petri nets (TTPN) are like regular black and white Petri 
nets, but transitions are associated a delay (“Time” attribute), in such a way 
that, before firing, they have to be enabled for a number of units of time equal 
to the delay. In this work we assume atomic firing (tokens remain in places until 
transitions fire) , age memory (each transition keeps the actual value of its timer 
when a transition fires), and single server semantics (each transition has only 
one timer). We have also added attribute Tleft to transitions, which controls the 
time it remains for the transition to be fired (in case the transition is enabled). 
Its value should be in the interval [0. . Time]. The TTPN meta-model is depicted 
to the right of Figure 3 enclosed in the dotted rectangle labelled as “TTPN 
Meta-Model” . 
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Fig. 3. Meta-model for the transformation from process interaction into TTPN 



The meta-model for the transformation process is also depicted in Figure 3, 
and is the disjoint union of the elements of both meta-models together with 
some extra links (shown in bold) that show how elements of both notations are 
related during the transformation process. We have relaxed some multiplicity 
constraints, in such a way that relationships “output” and “input” between ma- 
chines and queues have been assigned multiplicity “0..*” in the side of the queues. 
In the original Process Interaction meta-model, this multiplicity was “1..*” (each 
machine should have at least one input queue and at least one output queue). 
We have performed this relaxation in order to make the overall transformation 
process easier (as during the transformation process, machines will become un- 
connected). In the TTPN formalism we map pieces to tokens. The latter are 
represented as attributes of places. Tokens representing pieces are generated by 
transitions representing generators in the process interaction notation. A queue 
is represented by a place, and machines by two places representing the Busy and 
the Idle states. 

Once this meta-model is defined, we can model the formalism transformation. 
It can be applied to any process interaction model, as they are also typed over the 
meta-model for the transformation as stated in the previous section. For clarity, 
in the transformation rules we use the concrete syntax of both formalisms. In 
this case, it makes no difference to use concrete or abstract syntax as there is 
a one-to-one correspondence between the concrete and abstract syntax elements 
in both meta-models. 

Figures 4, 5, 6, 7 and 8 show the rules for the transformation. Nodes and 
edges in the left hand side (LHS), right hand side (RHS) and negative applica- 
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Rule 1 : Convert Generators 




LHS 




Time=sTime 

FinalTime=sFinal 



RHS 




Name="Gen"+vlAT 
Time=vlAT 
Tleft=vT next-sTime 



Time=sTime 

FinalTime=sFinal 




Rule 3: Convert Machines 




LHS 

l 

Name=vName 
| State=vState 
Tproc=vTproc 
Tend=vTend 
Machine 




Fig. 4. First three rules in the transformation from process interaction into TTPN 



tion condition (NAC) are assigned the same number if they represent the same 
object in the matching. The idea of the transformation is to provide each Pro- 
cess Interaction element with its corresponding Petri net element, transfer the 
connectivity of the Process Interaction elements to the Petri net, and then erase 
the Process Interaction elements. 

Rule 1 is applied to a generator not processed before (this is expressed with 
the NAC), and attaches a Petri net transition to it. The delay of this transition 
is set to the Inter Arrival Time (IAT) specified in the generator. The time which 
remains for firing is set to the time of the next generation event (signalled by 
Tnext attribute of the generator) minus the current simulation time (attribute 
Time of the timer.) In this way, the transition will fire at the same intervals 
(each IAT units of time) in which the generator produces a piece , including the 
first time (in Tleft units of time) . 

Rule 2 attaches a Petri net place to each queue. Rules 4 and 5 will assign 
to this place as many tokens as pieces are connected to the queue. The third 
rule attaches two Petri net places to each machine. They represent the machine 
states Idle and Busy. The rule puts a token in the place representing the Idle 
state, although this can be changed later by rule 6. As the machine can be in 
only one state at the same time, later we will make sure that the number of 
tokens in both places is exactly one, using a well known structural property of 
Petri nets for capacity constraints. 

As stated before, rules 4 and 5 (in Figure 5) convert the pieces stored in the 
queues into tokens for the place associated with the queue. Rule 4 is used when 
the queue has more than one piece stored, while Rule 5 is used to delete the last 
piece in the queue. By converting pieces into tokens we are losing information, 
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Rule 4: Delete Pieces 



Rule 5: Delete Last Piece 



LHS 



Name=pName 

1 Name=qName Tokens=pTokens 





RHS 



Name=pName 

1 Name=qName Tokens=pTokens+1 




Rule 6: Delete Piece in Machine 




| Name=iName 
! Tokens=iTokens 



Name=bName 
T okens=bT okens 



RHS 




Fig. 5. Second set of rules in the transformation from process interaction into TTPN 



as pieces have information about their creation time (attribute T creation ). This 
information loss is acceptable if we are not interested in using that information 
in the target formalism. 

Rule 6 is applied when a machine is Busy , that is, it is processing a piece. In 
this case, the piece is removed and the token is passed from the place representing 
the Idle state to the place representing that the machine is Busy. We simply erase 
the piece inside the machine , Rule 12 will configure the remaining time for firing 
the Petri net transitions associated with the end of processing of the machine. 

The third set of rules (depicted in Figures 6 and 7) is used to connect the Petri 
net elements according to the connectivity of the attached Process Interaction 
elements. Rule 7 connects the transition attached to each generator with the 
place attached to the connected queues. This rule is applied once for each queue 
attached to each generator. The connection between the generator and the queue 
is deleted to avoid multiple processing of the same generator and queue. 

Rule 8 connects (through a transition) the associated place of an incoming 
queue to a machine with the places associated with it. For the transition to 
be enabled, at least one token is needed in both the place associated with the 
queue (which represents an incoming piece ), and the place representing the Idle 
state of the machine. When the transition fires, it changes the machine state to 
Busy and removes a token from the place representing the queue. If the rule is 
executed, then the queue is also disconnected from the machine to avoid multiple 
processing of the same queue and machine, but this rule is applied for each queue 
attached to each machine. 

A similar situation (but for output queues) is modelled by Rule 9. It con- 
nects (through a transition) the places associated with the machine to the place 
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Rule 7: Connect Generators 




Rule 8: Connect Input Queues 




Fig. 6. Third set of rules in the transformation (rules 7 and 8) 



associated with an output queue. For this transition to be enabled, the place 
representing the Busy state should have at least one token. When the transition 
fires, it changes the machine state back to Idle and puts a token in the place 
representing the queue (modelling the fact that a piece has been processed by 
the machine). If this rule is executed, the queue is also disconnected from the 
machine , in the same way as the previous rule. 

Finally, Rule 10 models a similar situation to the previous one, but in this 
case the machine is busy. The only change with respect to the previous rule 
is that we have to configure the Petri net transition with the units of time 
remaining for firing. This is calculated as the time at which the machine finishes 
the processing of the piece minus the current simulation time. 

The last set of rules (shown in Figure 8) simply removes the Process Interac- 
tion elements, once the connectivity between these elements has been transfered 
to the Petri net elements. We want Rules 11 - 13 to be applied once, only when 
the Process Interaction elements are unconnected. We take advantage of the 
dangling edge condition of the Double Pushout Approach [19] for this. In this 
way, if any Process Interaction element has some connection, the rules cannot 
be applied, and we make sure that Rules 7-10 are applied as many times as pos- 
sible. The timer can only be deleted once we have processed all generators and 
pieces , otherwise we may need it in Rules 1 and 10. With the NACs, we make 
sure that the timer is deleted once machines and generators have been processed 
and thus deleted. 
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Rule 9: Connect Output Queues Idle 
LHS 





Rule 10: Connect Output Queues Busy 



LHS 





Fig. 7. Third set of rules in the transformation (rules 9 and 10) 



Rule 11: Delete Generators 
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\ Tnext=<ANY> / ! ! 
Generator 


Time=<ANY> 
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Timer j j 



Fig. 8. Fourth set of rules in the transformation from process interaction into TTPN 

The transformation has four well-defined phases: creation of the target for- 
malism elements (Figure 4), deletion of pieces (Figure 5), connection of the target 
formalism elements (Figures 6 and 7) and finally deletion of the source formalism 
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Name=inp Process 




Fig. 9. Result of the transformation of the model in figure 2 



elements (Figure 8). Although not strictly necessary, we can use a simple control 
structure for rule execution based on layers. Thus, we assign rules in each phase 
to different layers (numbered from 0 to 3, this numbering will be used later in 
the validation process). Rules in each layer are applied as long as possible. When 
none of the rules of layer n is applicable, then rules of layer n + 1 are considered. 
The graph grammar execution ends when none of the rules of the highest layer 
is applicable. Figure 9 shows the transformation into TTPN of the model in 
Figure 2. 

4 Validating the Transformation Process 

For a model transformation to be useful, one needs to ensure that it has cer- 
tain properties. The first one is functional behaviour, which can be further de- 
composed in termination and con fluency, and assures that the same output is 
obtained from equal inputs. Consistency ensures that models resulting from the 
transformation are valid instances of the target meta-model. Finally, usually one 
would like the source and target models to be equivalent in behaviour. These 
properties are examined in the following sections. 

4.1 Functional Behaviour 

A rewriting system has a functional behaviour, if it is terminating and confluent. 
Confluence means that the order of rule applications does not matter, i.e. all 
computations end with the same result. Confluence can be shown, if a system 
is terminating and all different rule applications on one and the same graph are 
confluent in the end. In the following, we first give termination criteria for graph 
transformation systems and then consider the check for confluence. 

Termination is undecidable in general, but we can show that the example 
transformation terminates. Starting from finite, correct models in the Process 
Interaction formalism, the rules can only be applied finitely many times. Rules 
1-3, which associate Petri net elements with the Process Interaction blocks, can 
only be applied once for each block. This is controlled by the NACs. Rules in 
Figure 5 erase pieces, so their application terminates if there are a finite number 
of them in the initial model (which is true since the model is finite). Rules 
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7-9 can be applied only a finite number of times, since they are applied once 
for each relationship generator2queue, input and output in the original Process 
Interaction model. Finally, Rules 10-12 also terminate, as they remove Process 
Interaction elements, and there are a finite number of them. Note also, that we 
do not have cycles: no new Process Interaction nodes or relationships are created, 
which could trigger the iterative execution of the rules. 

In [18] termination is proved by defining a layering conditions , which all rules 
must fulfill. This concept was further refined in [3], where the following layering 
functions are defined: 

1. cl, dl : T — > N are total functions assigning each node and edge type a unique 

creation and deletion layer such that Vi £ T : 0 < cl{t) < dl(t) < n. 

2. rl : R — > N is a total function which assigns each rule a unique layer. 

The execution of a graph grammar terminates (assuming a finite host graph) 

if all rules r £ R with rl(r) = k fulfil the following layering conditions: 

(a) r deletes only nodes and edges of types t such that dl(t) < k, 

(b) r creates only nodes and edges of types t such that d(t) > k, 

(c) r deletes at least one node or edge, 

(d) r uses only N AC s over nodes and edges of types t such that cl(t) < k. 

We cannot apply directly this concept, because we both create elements of the 
target formalism (in Rules 1-3, Layer 0), and delete elements of the source for- 
malism and it is not possible to find an appropriate layer order. But still, we 
can modify the previous concept (which was used for parsing purposes) to make 
it suitable for model transfomation grammars. We can classify the layers of our 
grammar either as deletion or creation layers. The former delete elements of the 
source formalism, while the latter create elements of the target formalism. For 
the deletion layers we can use the previous layering conditions. 

In our example, Layers 1-3 are deletion layers and we can meet the layering 
conditions specifying the layering functions as follows: cl(source) = dl(source) = 
1, cl(aux) = dl(aux) = 2, cl (tar get ) = dl{target) = 3. Where source and 
target are the node and edge types of source (Process Interaction) and target 
(TTPN) formalisms, while aux refers to the auxiliary node and edge types in- 
cluded in the meta-model for the transformation {busy, idle, place2queue and 
generator2transition edges in our example). 

For the creation layers in which we create new elements of the target for- 
malism (such as layer 0 in the example), we can define new layering conditions, 
where the first two conditions are the same as before: 

3 There exists an injective morphism m: N — > R with m o n o l = r for each 

Double Pushout Approach rule N L <- — K R. 

Conditions 1 and 2 do not change, i.e. are the previous ones. Deletion of 
at least one element is not required, but we must ensure that the application of 
rules is still bounded. This is expressed by condition 3 which states that, if a rule 
creates new elements, a forbidden graph structure is created by its application 



Automated Model Transformation and its Validation 



193 



such that this rule cannot be applied again at the same match. Since we still have 
the conditions that newly created elements are of an upper layer while deleted 
elements are not, also these new layering conditions state sufficient conditions for 
termination assuming that the graphs and sets of rules are finite. In our example, 
Rules 1-3 belong to layer 0 which is a creation layer, for which we should check 
the new layering conditions. These are satisfied, as rules create elements of the 
target and aux layers ( cl(aux ) = 2, cl (tar get) = 3) only and there is a morphism 
from each N AC to the rule’s RHS compatible with the LHS. Thus, the rules can 
be applied only once for each element of the source formalism. 

This concept of creation and deletion layers is not only applicable to the 
specific example proposed in this paper, but is also applicable to any other 
graph grammar. 

Critical pair analysis is known from term rewriting and is usually used 
to check if a term rewriting system is confluent. Critical pair analysis has been 
generalized to graph rewriting [16], and formalizes the idea of a minimal exam- 
ple of a conflicting situation. From the set of all critical pairs we can extract 
the objects and links that cause conflicts or dependencies. A system is locally 
confluent if all conflicting pairs are confluent, that is, they are extendible by 
transformation sequences leading to a common successor graph. In [12], criti- 
cal pair analysis was extended to attributed graph transformation. Critical pair 
analysis has been automated in the AGG tool. For the example we are dealing 
with, AGG finds no critical pairs. Note also that only rules in the same layer 
have to be checked for conflicts. 



4.2 Consistency 

With respect to consistency (that is, the final models are correct instances of 
the TTPN meta-model alone) , all rules produce model elements correctly typed 
according to the target meta-model. Note that if the transformation terminates, 
it means that Rules 11-13, which delete Process Interaction elements connected 
to TTPN elements are no longer applicable. This means that either we do not 
have Process Interaction elements or they are not connected to the TTPN el- 
ements (so that Rules 11-13 cannot be applied). The latter is not possible, as 
we connect each Process Interaction element with a TTPN element by the ap- 
plication of Rules 1-3. Finally, the TTPN elements are correctly connected, as 
the only rules that connect them are Rules 7-10, and these connections are not 
modified later. 



4.3 Behavioural Equivalence 

With respect to behavioural equivalence, Figure 10 shows how the Process In- 
teraction constructs are translated into TTPN. We (informally) show that this 
is their intended meaning. The situation shown in Figure 10 (a) shows the con- 
version of a queue with a number of attached generators. In the meta-model it 
is specified that a queue can receive connections from zero or more generators. 
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In a TTPN model, this is equivalent to having a place connected to one transi- 
tion for each generator connected to the original queue. This is exactly what we 
obtain when transforming the left hand side of Figure 10 (a), by means of the 
rules indicated on top of the arrow. 

Figure 10 (b) shows the conversion of a number of queues incoming to a ma- 
chine. A machine can have one or many incoming queues connected to it. What 
we want to express in an equivalent TTPN model is that we have a connection 
from each place representing each queue to the places representing the machine 
through independent transitions. Each transition connecting the places repre- 
senting the queues are in conflict when the machine is idle. This means that 
only one of the queues can put a piece in the machine. That is, we have mod- 
elled mutual exclusion of the machine. 

In a similar way, figures 10 (c) and (d) model the translation of the output 
queues connected to a machine. Machines can have one or more output queues. 
What we want to model here is that only one queue receives the processed 
piece. This is achieved by creating an independent transition for each queue, and 
making them synchronized (same associated time ) and in conflict, so all of them 
will be enabled at the same time, but only one of them will fire. 

5 Realization in AToM 3 and AGG 

The previous transformation has been modelled with AToM 3 . As critical pair 
analysis is available in the AGG tool [21], we extended AToM 3 with the capability 
to export graph grammars to AGG. This was not a straightforward task, as 
AToM 3 and AGG are implemented in different languages (Python and Java) 
and the user may use these languages to express applicability conditions as well 
as attribute computations. Additionally, the attribution concept has a different 
philosophy in both tools. In AGG variables are assigned to attributes (possibly 
the same variable to more than one attribute, which means that they should 
have the same value). These variables can be used in RHS nodes’ attributes 
to calculate new values and in additional applicability conditions. In AToM 3 
however, there is an API to access nodes (via labels) and its attributes. In 
the applicability conditions it is possible to use the AToM 3 API to formulate 
expressions on combinations of several attributes. In the nodes of the RHS, 
the value of the same attribute of the mapped node in the LHS can be copied, 
a concrete value can be specified, or it can be calculated with a program. Copying 
attribute values from LHS to RHS is straightforward to translate into AGG, 
because it is equivalent to defining a variable and assign it to both attributes 
in LHS and RHS. For the translation of Python expressions (using the AToM 3 
API) into Java, a small translator was implemented. 

Another issue to be taken into account is that in AToM 3 the execution control 
of rules is different to the one in AGG. In AToM 3 , each rule is given a priority 
(partial order). The graph rewriting engine starts checking the set of rules with 
the highest priority. If some of them can be executed, then it checks again for 
the applicability of one of the rules in the set. Otherwise, it tries the set with 
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Fig. 10. Equivalent TTPN constructs and their translation 



the next priority. If some of them are applicable, then it goes back to the set of 
higher priority rules. In AGG, there is the concept of layers, but there is no loop 
to the highest layer when a rule of a lower layer gets executed. However, if all 
the rules in AToM 3 are given the same priority, the execution is the same than 
in AGG if no layers are used. If priorities are used in AToM 3 , only rules with 
the same priority can be in conflict. 

Furthermore, one can apply sequences of graph grammars, where we define 
the start graph for the first graph grammar, and the start graph grammar for 
the next grammar is the result of the computation of the previous one. This is 
useful if several phases can be identified in the computations. If all the rules in 
each graph grammar are given the same priority, it is equivalent to the layering 
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concept of AGG. As stated before, this concept of layers was used in the ex- 
ample: the process has been decomposed in four graph grammars, that should 
be applied in sequence. This partitioning of the computation in several graph 
grammars has several practical advantages: the computation can be performed 
more efficiently, as at each moment less rules have to be considered by the graph 
rewriting engine. This is due to the fact that the rules that cannot be executed at 
that moment have been separated in a different graph grammar. The possibility 
of conflicting rules is also reduced, and thus, computation for critical pairs in 
AGG can be notably reduced. Furthermore, a demonstration of termination in 
the style of [18], using the concept of layers could be used for each particular 
graph grammar. 



6 Related Work 

In addition to the model transformation approaches referred to in the introduc- 
tion, there is related work that is worth mentioning: In [22], model checking 
is proposed as a means to validate model transformation specified using graph 
transformation. A graph transformation system can be expressed as a Kripke 
structure, thus the properties of interest about the transformation can be mod- 
elled using temporal logic formulae. For our approach, one could express be- 
havioural equivalence properties (of both source and target models) using tem- 
poral logics. For example, one may check whether certain structures in the mod- 
els of the initial formalism always result in the same kind of structures (that 
we consider equivalent in behaviour) in the target formalism (see section 4.3 for 
a discussion about this issue). 

In [9] a meta-model for the transformations between Entity- Relationship and 
Relational Data models was defined using the Meta Object Facility (MOF) [15]. 
The central issue here was the definition of invariants (using OCL) for the trans- 
formation process (not specified using graph transformation). These invariants 
are a means to specify properties that the transformation should have. The 
authors suggest that it may be possible to derive transformation classes or op- 
erations from these constraints. 

The graph transformation approach taken in this work resembles transforma- 
tion units [13] to some extent. These encapsulate a specification of initial graphs 
(source formalism meta-model in our approach) and terminal graphs (target 
formalism meta-model in our approach), a set of identifiers referring to other 
transformation units to be used, a set of graph transformation rules, and a con- 
trol condition for the rules. Note that this approach only defines conditions to 
be met by source and target graphs, but not on intermediate graphs. This is one 
of the key points of the work presented in this paper. 

Another related approach is the use of triple graph grammars [20]. In this 
way, one could obtain translators from the source to the target formalism and 
vice versa with the same grammar. The approach is mostly useful for syntax- 
directed environments, in which the editing actions are specified by means of 
graph grammar rules. With triple graph grammars, while the user is building 
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a model in the source formalism, an equivalent model is created at the same 
time in the target formalism. It is not straightforward to use this approach to 
translate an existing model, as in this case the graph grammar rules must be 
monotonic (any production’s LHS must be part of its RHS [20]). 

7 Conclusions 

In this paper, we proposed a model transformation based on meta-models such 
that there is also a meta-model for intermediate models. This meta-model con- 
tains the meta-models of both, source and target formalisms, as well as additional 
elements. This formalization allows the validation of the functional behaviour of 
the transformation process and makes it easier to check also other properties, 
such as consistency and behavioural equivalence. In particular, we have modified 
the layering conditions for terminating parsing processes as proposed in [3] to 
make them suitable for model transformation grammars. We have mostly auto- 
mated these concepts with AToM 3 and AGG and provided an example where 
we have validated the transformation of a notion for Discrete Event Simulation 
in the Process Interaction style into TTPN. 

In the future, we would like to use this transformation grammar to auto- 
matically derive simulators [7]. If a simulator for the source formalism is given, 
the transformation could be directly extended to simulation rules and we would 
obtain a simulator for the target formalism (expressed as a graph transforma- 
tion system). Given a simulator for the target formalism, e.g. if we want to give 
a formal semantics to some visual model, the model transformation has to be 
inverted to yield a simulator for the visual models. It is up to future work to 
find out when and how a model transformation can be inverted. 

Altogether, the sample model transformation presented in this paper shows 
that graph transformation is a very natural means to transform visual models 
and offers a variety of validation facilities to ensure correctness of these trans- 
formations. 
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Abstract. In this paper we examine inter-diagrammatic reasoning 
(IDR) as a framework for digital geometry. We show how IDR can be 
used to represent digital geometry in two dimensions, as well as pro- 
viding a concise language for specifying algorithms. As an example, we 
specify algorithms and examine the algorithmic complexity of using IDR 
for reasoning about the relationships between planar regions. Finally, 
we discuss the circumstance under which the algorithmic abstractions 
in IDR can produce efficient algorithms under realistic computational 
assumptions. 



1 Introduction 

Inter-diagrammatic reasoning (IDR) [1, 2] was developed to provide a logical 
framework for reasoning about information in diagrams. Specifically, it was de- 
signed so the fundamental computation step is the combination of multiple 
diagrams-problems are solved and inferences are made by combining the in- 
formation represented on different diagrams rather than by reasoning about the 
contents of a single diagram. It provides a set of abstract operators that can 
be used to concisely specify an algorithm over diagrams, and has been used in 
a range of domains from geography to music. 

Digital geometry [3, 4, 5, 6, 7, 8] is concerned with geometry on discrete 
space, and how the properties of such a space compare with continuous space. 
The work is motivated by the fact that information from continuous space may 
be approximated by information on a discrete set of areas (such as a rectangular 
array of pixels), and so reasoning about the continuous space involves computa- 
tion on the discrete data. Much of this work in this area concerns the topology of 
such spaces, particularly how topological features in continuous space reflected 
in the corresponding digital space. This has important applications in medical 
(and other) image processing. 

In this paper we concern ourselves with planar geometry, and show how IDR 
provides an algorithmic and computational framework for computing in planar 
digital space. We examine the characteristics of digital geometry and digital 
pictures, and examine inter-diagrammatic reasoning as a way to represent data 
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and computations relating to planar digital pictures. To provide evidence of 
IDR’s applicability, we look at the problem of determining basic topological 
relations between regions in continuous space, map these problems into digital 
space, and give algorithms for solving these problems using IDR. 



2 Digital Geometry and Digital Pictures 

The key difference between digital and continuous geometry is the atomic unit 
involved: continuous geometry is built from points (arbitrary (x, y ) pairs in the 
Euclidian plane[9, 10]); digital geometry is built from discrete areas (which we 
will, for convenience, call pixels) in that plane, each of which corresponds to 
the open set of points in an area bounded by a simple closed curve. While the 
universe in continuous planar geometry is E 2 , the Euclidian plane, the universe 
in digital geometry is a countable set of pixels V [11], such that the intersection 
of any two pixels is null, and the set of pixels and their boundaries cover the 
plane, i.e. 

U ( ViUd Pi ) = E 2 

where Vi is the ith pixel and dp i is its boundary. In this paper we will use d(x , y ) 
to denote the boundary between x and y; when used with a single subscript (as 
here), it denotes the boundary between the subscripted item (the interior of the 
boundary) and the rest of the plane (the exterior of the boundary) . 

Herman [3] gives a general definition of a digital space as (U, 77), where V 
is a set of regions and II C V x V is a symmetric adjacency relation. Using 
pixels as above, we will adapt this into a definition for a digital space in a plane: 
a countable set of pixels V, and a symmetric adjacency relation II. We will 
restrict the notion of adjacency to require adjacent elements to have intersecting 
boundaries, but do not require all pixels with intersecting boundaries to be 
adjacent, i.e. 

nc^ViM) :d Vi nd Vj ^V}. 

The result of this restriction is that the pixel adjacency relation is somewhat less 
general than an arbitrary undirected graph (where any pair of vertices might be 
adjacent). Since II is a subset of all physically adjacent pixels we are free to 
further restrict the adjacency relation when appropriate. 

A digital picture is a digital space where each pixel has an associated value, 
which is its grey level or color. More precisely, a digital picture is a triple 
(V, 77, /), where V and 77 are defined as above, and with / : V — * R the function 
that defines the value of each pixel. An interesting special case [3, 12] is that of 
binary pictures , where / maps each pixel to either 0 or 1; an important problem 
in medical image processing is segmentation , computing a binary picture from 
one with real pixel values. 
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3 Inter-diagrammatic Reasoning 

Inter-diagrammatic reasoning was designed so the fundamental computation step 
is the combination of multiple diagrams. In any application, these diagrams are 
based on identical discrete tessellations of a planar region, and the information 
from any two diagrams can be combined by applying a simple function to all 
corresponding tessera pairs. 

Specifically, in IDR a diagram is a bounded planar region with a finite tes- 
sellation, where each tessera t has a corresponding value, its color, denoted f t . 
This color is from a subtractive CMY color scale Cij t k [13], intuitively corre- 
sponding to transparent color filters where i,j, and k are the cyan, magenta, 
and yellow contributions respectively, each constrained to be an integer value 
from 0 to N — 1. These “filters” combine additively to produce values from a 
minimum of co,o,o (WHITE) to Cn-i,n-i,n-i (BLACK). For any tessellation, 
we distinguish the null diagram, denoted 0, where all tessera are valued WHITE. 

The basic operators are unary and binary functions on diagrams: these bi- 
nary functions demand that the diagrams have the same underlying tessellations, 
so the notion of corresponding tesserae is simply defined. The use of these op- 
erations has been demonstrated in a number of domains where there is some 
natural similarity between diagrams: game boards, maps, and musical notation 
for example, as well as more generally usable ’diagrams’ such as standard x-y 
grids of pixels. 

The basic functions that combine two diagrams into a new one are based on 
mappings on the corresponding tesserae: each tessera in the result is based on the 
values from the corresponding tesserae in the inputs. Similarly, the not function 
returns a diagram where each tessera is the negation of the corresponding input 
tessera. The color mappings are based on addition, subtraction, max, and min, 
independently for each color, with each color component restricted to be in the 
range [0 ,N — 1]. The rules for combining the color values of pixels are: 

max(c U(WjW , C Xf y f z) ^max(u,3;),max(?j ! ^),max(^ ! 2) 
min {c U jVjW , C x ,y t z) Gnin('u,:c) ,rnin(v,y) ,mi n (iy,a) 

Cu,v,w T C X ,y,z ~ Gnin(?i-t-:r,jV— l),min(i;-t-y,./V— l),min(w +z,N— 1) 

Cu,v,w Cx,y,z C max ( n _ x,0),max(i;— j/,0),max(ia- 2 , 0 ) 

Cu,v,w — Cv— 1 — u,N— 1— v,N— 1— w 

The functions defined on diagrams are the following, using d a and db to 
denote two diagrams of equal tessellation, and f ri ,f ai and fb t to denote the color 
values of corresponding tesserae in the result, d a , and db respectively. These are 
two-place functions unless stated otherwise: 

Or denoted d a \/db returns a diagram where every tessera’s value is the maximum 
of its corresponding tessera in d a and db, that is f Ti = max(/ Qi , fi H ) . 

And denoted d a A db returns a diagram where every tessera’s value is the min- 
imum of its corresponding tessera in d a and db, that is ,/ n = min(/ ai , f^). 
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Overlay denoted d a + db returns a diagram where every tessera’s value is the 
sum of its corresponding tessera in d a and db, that is, f n = f ai + fb t - 
Peel denoted d a — db returns a diagram where every tessera’s value is the dif- 
ference of its corresponding tessera in d a and db, that is, f ri = f ai — fb, ■ 
Not denoted ->d 0 is a one-place operator that returns a diagram where ev- 
ery tessera’s value is the difference between BLACK and its corresponding 
tessera, that is, f Ti = f ai - 

Additionally, there are the following mapping functions defined on sequences of 
diagrams: 

Accumulate denoted a(d, s, o) is a three-place function taking an initial dia- 
gram d, a sequence of diagrams s = d±,d 2 , ■ ■ ■ dn, and a binary diagrammatic 
function o that returns the accumulation (• • • (do d±) 0 CZ 2 ) • • •) o d n , the result 
of successively applying o to d and each diagram in s. 

Map denoted g(g, s±, . . . s n ), which is an n + 1 place function that takes an n 
place function and n diagram sequences of equal cardinality, and returns the 
diagram sequence resulting from applying the function g to the correspond- 
ing elements of each diagram sequence. 

Filter denoted (f>(g, s ) is a two place function that takes a boolean function g 
and a sequence of diagrams s and returns a sequence of diagrams which are 
the members of s for which g returns TRUE. 

Additional operators include: 

Null denoted g(d a ) is a one place boolean function that returns TRUE if all 
the tesserae in d a are WHITE. 

Lambda denoted A v.b, where v is a variable symbol (or list of variable symbols 
separated by commas) and b is an expression. Lambda is used for function 
abstraction; the value of a Lambda applied to variables is the value of the 
form b, substituting the values of the variables for the symbols in v that 
occur in b. 

The correspondences between IDR and digital pictures (as defined above) 
are fairly straightforward: the set of tesserae correspond to V, and the color of 
the tesserae provide the mapping / (although the colors are from a discrete set 
rather than real numbers) . There is nothing defined in IDR that corresponds to 
the adjacency relation IT, so if such information is needed it must be defined 
separately (which we do in Section 6). 



4 Relations Between Bounded Regions 
in Continuous Space 

Researchers in spatial reasoning and geographic information systems have pro- 
posed topological relations among regions in continuous space based on their 
intersections. A region here is defined relative to a simple closed curve that par- 
titions the plane into an interior, a boundary (the curve), and an exterior. Two 
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Table 1 . Topological relations between region x and region y. I r ,E r , and d r denote 
the interior, exterior, and boundary of region r respectively 



High resolution/RCC-8 relations 


1 9 r relation 


RCC relation 


definition (intersections) 


disjoint(x,y) 


DC(x,y) 


l x nly=9>Ad x ndy = (D 


meet(x,y) 


EC(x,y) 


i x n Iy = 0 A d x n dy ^ 0 


overlap (x,y) 


PO(x,y) 


i x n i y ^ 0 a E x n i y yf 0 a i x n E y ^ 0 


equal(x,y) 


EQ(x,y) 


i x n Ey = 0 a e x n Iy = 0 


coveredBy(x,y) 


TPP(x,y) 


I x n Ey = 0 A E x n Iy + 0 A d x n dy ± 0 


inside (x,y) 


NTPP(x,y) 


Ix n Ey m 0 A d x n dy = 0 


covers(x,y) 


TPP -1 (x,y) 


I x n Ey ^ 0 A E x n Iy = 0 A d x n dy + 0 


contains(x,y) 


NTPP-^x.y) 


Iy n E x = 0 A d X n dy = 0 



Medium resolution/RCC-5 relations 


I9 r relation 


RCC relation 


definition (intersections) 


disjoint / meet (x,y) 


DR(x,y) 


-s 

II 

c 


overlap (x,y) 


PO(x,y) 


i x n i y =£ 0 a E x n i y ^ 0 a i x n E y =/= 0 


equal(x,y) 


EQ(x,y) 


i x n E y = 0 a E x n i y = 0 


coveredBy/inside(x,y) 


PP(x,y) 


I x n Ey = 0 A E x n Iy =£ 0 


covers/contains(x,y) 


PP^y) 


Iy n E x s 0 A Ey n i x + 0 



independently-developed approaches are the Region Connected Calculus (RCC) 
[14, 15, 16] and the 9-intersection model (/9 r )[9, 17, 6]. For both of these, the 
relations (between two regions) can be defined by the intersections of the in- 
teriors, exteriors, and boundaries of the regions. Each has an eight-relation set 
and a five-relation set: these are summarized in Table 1 and Figure 1. The five- 
relation sets are those that can be defined without considering intersections of 
the boundaries; they can also be obtained as disjunctions of the appropriate 
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EC(x,y) 


TPP(x,y) 


TPP’(x.y) 
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NTPP(x,y) 


NTPP'(x.y) 



Fig. 1 . RCC-8 relations between two regions 
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pairs of relations in the eight-relation set. Renz et al. [ 8] define an alternative 
RCC-5 (that cannot be defined without the boundary intersections) by disjoining 
different RCC-8 relations. 

Much of the RCC/ 19 r work is concerned with logical inference: how to de- 
duce things from RCC/1 9 r information, what sorts of inferences are tractable, 
and so forth. Here we are concerned with the more direct questions: given the 
representation of a pair of regions, what RCC / I9 r relations hold between them. 

Algorithms to determine these relations depend on the representation of the 
regions. We will assume (in continuous space) that we define a region by its 
boundary, and that the boundary is composed of a sequence of line segments 
(ordered as an endpoint-connected path) ; variants of the same algorithms would 
work for other representations (such as a sequence of one or more paramet- 
ric curves). The algorithmic complexity of these can be stated in terms of the 
number of segments on the boundary. Algorithms for finding whether the vari- 
ous interiors, exteriors, and boundaries intersect are given in [10]. Summarizing, 
finding whether and how the boundaries intersect is equivalent to finding the K 
intersections in a set of N line segments, which can be done in 0((N + K) lg TV); 
these intersections can be used to tell whether the interiors and exteriors in- 
tersect as well (given that the boundaries intersect). If the boundaries do not 
intersect, either one region is contained within the other, or they are disjoint. 
This can be determined by checking whether an arbitrary point of one region 
is contained in the other (for each region), at a cost of 0(N ) by traversing the 
segments in order and counting the number of segments that are intersected by 
a ray going right from the arbitrary point (taking care at vertex intersections); 
the number of intersections will be odd if the point is contained in the region. 
Determining the intersection relations, and therefore the RCC /1 9 r relations, can 
be determined in 0((N + K) lg TV) time, where TV is the number of line segments 
and K is the number of boundary intersections. 



5 Regions and Relations in Digital Geometry 

In order to examine the relations between regions as defined in RCC/ 19 r in 
digital space requires that we define regions, their boundaries, their interiors, and 
their exteriors in digital space. In continuous space, region boundaries are simple 
connected curves that divides the plane into two disjoint non-empty connected 
regions; importantly, any curve between a point in the region and its exterior 
must intersect the boundary. We need to find an analogous definition in digital 
space that shares these intrinsic properties. 

As an analogue to a curve between two points we define a path of pixels. 
A path is a sequence of pixels such that each successive element is II— adjacent 
to the previous one in the sequence; specifically, p = (Vn, . . . ,Vu) is a path 
from Vo to V k ii V 0 <i<k(V i ,V i+1 ) € II. 

An entity that matches the desired boundary characteristics in digital space 
is the Jordan Surface defined in [3]. Define the boundary between two disjoint 
subsets of V as a subset of 77: given A and 7?, two disjoint subsets of V, d(A, 77), 



Inter-diagrammatic Reasoning and Digital Geometry 205 



the boundary between A and B , is defined as d(A, B) = {(a, b)\(a, b) £ 77 A a £ 
A A b £ £?}. Further, we say that a path p = (Vo, . . . , 14) crosses a boundary 9 
iff either (V), Vj+i) £ d or (Vj+i, Vj) £ d for some j : 0 < j < k. 

If 5 is a Jordan surface, it induces two sets, 7(5) and E(S) such that 

1. S = d(I(S),E(S)) 

2. 7(5) U E(S) = V and 7(5) n E(S) = 0 

3. both 7(5) and E(S) are 77-connected (that is, any two pixels in the set can 

be connected by a path, as defined above). 

4. every 77-path from a pixel in 7(5) to a pixel in E(S) crosses 5. 

Given this we define: a simple region in digital space is the interior of a non- 
empty Jordan surface 5. This matches the characteristics given at the start of 
this section: by being non-empty, a region must have at least 1 pixel, as must 
its exterior. A path from the interior to the exterior must cross the boundary, 
which is analogous to intersection here. Moreover, since both the region and its 
complement are connected by paths that do not cross the boundary, we cannot 
have simple regions with holes. 

Consistent with earlier notation, we will use a single subscript for the bound- 
ary between a region and its complement. Additionally we will extend this no- 
tation to the sets 77(5), IE(S), and 777(5) (immediate interior, exterior, and 
neighborhood of a surface) [3] to those sets defined on a boundary of a region: 

d r = {(c, d) £ 77 :c£r/\d£V — r} 

II r = {c : (c, d) £ d r } 

IE r = {d : (c, d) £ d r } 

IN r = II r U IE r 

Notice that these boundary definition allow us to consider boundaries without 
introducing new entities (as in [6], e.g.). The notion of intersecting the boundary 
in continuous space is replaced by the discrete notion of a pixel-path crossing 
the boundary. It also captures the notion of continuous regions that the area of 
the region is all in the interior. 

Given this definition of a simple region, we can define the equivalent 
RCC/I9 r relations in digital space. The intersections between the interiors or 
exteriors will map directly, as the interiors and exteriors of regions are sets of 
pixels. Boundaries, however, must be handled differently. The boundary elements 
are ordered pairs of pixels, and boundary behavior is not completely analogous 
to a simple curve: a boundary may not form a path of pixels in IN, two bound- 
aries can cross without having any boundary elements in common, and so forth. 
However, the only “interesting” boundary intersections used for RCC/I9 r are 
tangential in continuous space, intersections where the boundaries do not cross, 
and the interiors of the two regions are either disjoint (meet/ EC) or one is con- 
tained in the other (cover edBy /TP P or covers /TP P -1 ). Tangential intersec- 
tions have a digital analogue: if two boundaries d p and d q intersect tangentially, 
then there exist pixels Vi and Vj such that (Vi, Vj) £ d p , and either (Vj,Vi) £ d q 
(if the interiors are disjoint), or (Vi, Vj) £ d q if they are not. 
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Fig. 2. A sequence of diagrams sn representing the adjacency relation 77 





6 RCC Algorithms Using IDR 

Given our definitions for the RCC-8 relations in digital space, we can give IDR 
algorithms for determining which of these hold. In the following discussion, each 
pixel set is represented by a diagram with each pixel in the set colored BLACK 
and the rest colored WHITE. Given the definitions in Table 1, we will sometimes 
also need to compute the region boundaries. To compute a boundary, we can do 
the following: 

Let Sn be a sequence (without duplicates) of diagrams, each of which is all 
WHITE with the exception of two BLACK pixels, Pi and Pj such that ( Pi,Pj ) £ 
II . Since II is symmetric, there are half as many diagrams in this sequence as 
there are elements in 77. If we represent a region A as the diagram where all 
pixels in A are BLACK and the rest are WHITE, we can calculate the diagram 
of the immediate neighborhood IN a as the union of all the elements of sn that 
intersect both A and its exterior (these elements correspond to the pairs in the 
boundary, ignoring order). We can do this as an accumulation of a filter (in 
IDR): 



INa = a(0, A x) & !^(^A A x)), sn ), V) 

(where we use & and ! to denote logical conjunction and negation respectively). 

From this, we can easily compute the immediate interior II a and immediate 
exterior IE a by intersection with the interior or exterior: 

II A = IN a A A 
IE a = INa A —iA 

We illustrate this process in figures 2-5. The diagrams are n all regular hexag- 
onal tesselations, with the elements of 77 being those pairs of (hexagonal) pixels 
whose boundaries meet. 
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Fig. 3. Diagrams representing a region A and its complement —iA 







Fig. 4. A sequence of diagrams equal to 4>(\x.(\r](A A x) & \rj(-iA A a;)), sn) 





6.1 RCC-8 Algorithms in IDR 

Given two regions A and B (plus the associated 1 1 a, IE a, I Ib, and IEb ; a total 
of six diagrams), we can determine whether the RCC-8 relations hold as follows: 

— A and B are disjoint: in continuous space, is true if neither the regions 
nor their boundaries intersect. In digital space, this is true if the regions do 
not intersect, and their boundaries do not meet tangentially. If the regions 
do not intersect, the borders meet tangentially if there exist pixels V) and Vj 
such that (Vi,Vj) £ Oa and {Vj,Vi) £ Ob', for non-intersecting regions this 
is true iff II a and IEb intersect (alternatively iff IE a and IIb intersect). 
In IDR, this can be logically (and algorithmically) represented as: 

disjoint(A, B) 4=> rj(A A B) & 77(1X4 A IEb)- 

— A meets B: in continuous space, is true if the regions have intersecting 
boundaries, but their interiors do not intersect. Using II and IE as above 
to show tangentially intersecting boundaries, we can represent this in IDR 
as: 

meet(A, B) <-»• 77 (A A B) & \t]{IIa A IEb)- 

— A overlaps B: in continuous space, is true if the regions have intersecting 
interiors, but neither is a subset of the other. If region A intersects region B' s 
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Fig. 5. Calculating the neighborhoods: INa is the accumulation (by V) of the filtered 
sequence in Figure 4. From this, A, and -i A we can calculate IIa and IEa by anding 
together diagrams 



exterior, then A is not a subset of B , and vice-versa, so this is equivalent to 
checking three intersections: An B, (V — A) Cl B, and A Cl (V — B). In IDR, 
this can be expressed as: 

overlaps(A, B) <=>\r](A A B) & \r](->A AB)fe \r] {A A ->J5). 

— A equals B: in continuous space, is true if the regions have the same in- 
teriors; equivalently, this is true if the interior of A does not intersect the 
exterior of B, and vice-versa. In IDR: 

equals(A, B) <f=> r](A A ~>B) & t](->A A B). 

— A is covered by B: is true in continuous space if A is a subset of B and 
their boundaries meet. For boundary intersection we act as with meet , with 
a slight difference: if A is a proper subset of B , the boundary intersections 
will be pairs in the same order, so it is sufficient to check whether their II’s 
(alternatively IE’s) intersect: In IDR, this can be represented as: 

coveredBy(A, B) <f=> r](A A ->B) & !t?(^A AB)fe Wj (II a A IIb )■ 

— A is inside B: is true in continuous space if A is a subset of B and their 
boundaries do not intersect. It is similar to covered by, although the “proper” 
part of the subset test is unneccessary if the boundaries do not meet. In IDR, 
we have: 



inside(A, B) 4=> rj(A A ~>B) & t](IIa A IIb)- 




Inter-diagrammatic Reasoning and Digital Geometry 209 



— A covers B: is the same as “B is covered by A” , so simply swap A and 73 
in the above: 

covers(A, B ) •£=> r](^A A B) & \rj(A A ~^B) & It](IIa A 77b). 

— A contains B: is the same as “B is inside A”, so simply swap A and B in 
the above: 



contains(A, B) <^> r](^A A B) & t](IIa A 77b). 



6.2 RCC-5 Algorithms Using IDR 

Considering the regions without their boundaries will reduce the RCC-8 set to 
the RCC-5 relations as defined in Table 1. As above, we state these in terms of 
IDR, notably not using any boundary information: 

— A and B are disjoint or meet: in continuous space, is true if the regions 
don’t intersect. In IDR, this can be represented as: 

disjoint/meet(A, B) r/(A A B). 

— A overlaps B: is the same as the RCC-8 version: 

overlaps) A, 7 3) ^>\ri(A A B) & \r](->A A 77) & !?y(A A ~^B). 

— A equals B: is also the same as the RCC-8 version: 

equals(A, B) <t=> rj(A A ~^B) & r/(^A A 73). 

— A is inside or covered by B : is true in continuous space if A is a proper 
subset of 73. In IDR, this can be represented as: 

inside/coveredBy(A, 73) rj(A A ^73) & Iry)— >A A 73). 

— A covers or contains B: is the same as “B is inside or covered by A”, so 
simply swap A and 73 in the above: 

covers/contains(A, 73) r?(^A A 73) & !ry(A A ~^B). 



6.3 IDR Algorithm Complexity 

The complexity of each algorithm for the RCC-8 and RCC-5 relations depends 
on the complexity of the IDR operators. In the following, when discussing par- 
allelism, we restrict ourselves to two cases: 

— if we are doing the same independent operation on all the pixels (or corre- 
sponding pairs of pixels), and 

— if we want to detect whether all of the pixels are WHITE. 
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Table 2. IDR computational costs, n is the number of pixels, k is the number of 
diagrams in the appropriate sequence (s); g v and g s are the costs of applying the 
function that is part of the operator on a parallel and sequential machine respectively. 
^Parallelism restricted to tesselation as above 



Operation 


parallel* 


sequential 


Or, And, Overlay, Peel, Not 


1 


0(n) 


Map, Filter 


g p O(k) 


g s O(k) 


Null 


O(lgn) 


0(n) 


Accumulate 


g p O{k) 


g s O(k) 



That is, parallelism is related only to the tesselation, not to other data entities. 

The IDR operators Or, And, Overlay, and Peel (section 3) are evaluated by 
applying the operator on each pair of corresponding pixels across two diagrams 
(similar for Not on one diagram). Given that each pixel-pixel computation is 
independent of all of the others, this is quite naturally done on a parallel ma- 
chine. With a sufficient number of processors (one per pixel, e.g.), these can each 
be done in constant time; it would take time linear in the number of pixels on 
a sequential machine. Map and Filter , which apply a function to corresponding 
tuples of diagrams in a set of diagram sequences (one sequence in the case of 
Filter, have costs proportional to the number of the elements in each sequence 
times the cost of the applying the function. Null can be done in log time on a par- 
allel machine (as above), or in linear time on a sequential machine. Accumulate, 
which combines a sequence of diagrams into one using a sequence of function 
applications, can be done (on a sequential machine) in time proportional to the 
length of the sequence times the cost of the function. The same bounds hold on 
a parallel machine given the above restrictions on parallelism, although the cost 
of applying the function may be less. These are summarized in Table 2. 

Given this, we can express the complexity of determining what RCC//9 r 
relations hold between two regions A and B. Once 1 1 a, IE a, II b, and IEb have 
been computed, each relation can be determined in log time on a parallel machine 
(linear time on a sequential machine). Each of these can be computed using 
a constant number of IDR Ands and Nots, plus a constant number of IDR null 
tests, plus a constant number of boolean ands and nots. On a parallel machine, 
therefore, these can be calculated in O(lgn) time; on a sequential machine it 
would take 0{n) time. 

To calculate most of the RCC-8 relations, it is necessary to calculate the II 
and IE diagrams, which can be easily done from the IN diagram. Recall, for 
region A, 

INa = ck(0, (f>(Xx.(\ri(A Ai)& !?y(^A A x))sn), V) 

Using Table 2, the cost of this is the cost of the accumulation (a), plus the 
cost of the filter (</>). The cost of the accumulation is the size of the sequence 
(the number of boundary pairs) times the cost of the IDR Or function, that is 
0(\3a\) for parallel, 0(|<9 a||U|) for sequential (using the notation IS) to denote 
the cardinality of a set or sequence S). The cost of the filter is proportional to 
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the size of its sequence times the cost of the IDR null function, which yields 
0(|I7| lg |V|) for parallel, 0(|77||R|) for sequential. Since |cU| < |77|, the overall 
costs are 0(|77| lg |F|) (parallel), or 0(|77||K|) (sequential). 

Summarizing from this, the cost of determining the relations using IDR for 
the RCC-5 relations are 0(lg |R|) (parallel) or 0(|R|) (sequential); for the six 
RCC-8 relations that use the boundary information (i.e. all but equals and over- 
lap) they are 0(|7T| lg |R|) (parallel) or 0(|77||y|) (serial). 

For most reasonable cases, we can bound |ZZj to be linear in the number of 
pixels. Suppose our pixels are polygonal. Then we can define a planar graph 
where the vertices of the graph are the vertices of the pixels, the edges are the 
segments connecting them (the sides of the polygons), and the faces correspond 
to the pixels. If the number of pixels that meet at a point is bounded by a con- 
stant, and either 1) the number of sides that a pixel has is bounded by a constant 
or 2) each vertex in the planar graph has degree of at least three, then |IT| will 
be 0(|R|). This is obviously so for 1), since each pixel has no more neighbors 
than the product of these constants; for 2), the total number of edges, faces, and 
vertices are pairwise proportional [ 1 0], so the total number of neighbor relations 
is bounded by a constant times the number of faces, which correspond to pixels. 



7 Related Work 

This work relates most directly to two different areas of research, digital geom- 
etry, and diagrammatic reasoning based on pixel computations. 

Digital geometry [8] analyzes the spatial topological and geometric problems 
of discrete grid worlds. Because our work is also based on discrete planar par- 
titions, much of this work is directly applicable, although generally resticted to 
certain regular tessellations. Herman’s work [3, 19] is particularly useful here, as 
his definition of digital space is least restrictive, and his notions of near-Jordan 
and Jordan surfaces apply well to these problems in digital space that is not 
necessarily partitioned as a grid. 

A number of authors examine the topological problems inherent in using dis- 
crete regions instead of continuous space. Khalimsky et al. [20] proposed a topol- 
ogy on a square grid for digital image representation that allows a Jordan Curve 
theorem, and overcomes certain connectivity problems. Latecki [21, 4] defined 
a well-composed set that does not allow certain local pixel configurations. Win- 
ter and Frank in [6] defined a Hybrid Raster Representation where each pixel 
(cell) is integrated with a boundary of edges and nodes. More generally, Kong 
and Rosenfeld [12] present a comparison between graph-based and topology- 
based representations for digital pictures; Egenhofer and Sharma [22] discuss 
the modeling of the topological relations among bounded objects embedded in 
the discrete space Z 2 . 

The application used in this paper, the spatial relationships between regions, 
has been related to work in qualitative spatial reasoning; its initial motivation 
was based on Allen’s [23] use of intervals for temporal reasoning. This work has 
been continued and extended to dealing with more complex regions as well as 
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Fig. 6. An example PRS rule, as in BITPICT. The rule is given at the top, the three 
grids represent the original diagram, a diagram obtained by a single application of the 
rule, and the ultimate resulting diagram 



uncertainty; see e.g. [14, 24, 25, 9]. Raugh et al. [18] performed cognitive testing 
on the adequacy of the RCC/1 9 r relations (and others) to describe the spatial 
relationship between any two regions. 

An alternative approach to computation in digital space is the use of pixel 
rewrite systems. Most directly applicable is the work of Furnas [26, 27], whose 
BITPICT system (and its successors) can solve problems by direct manipulation 
of a grid of pixels. 

This approach involves defining rewrite rules that act on patterns of pix- 
els. Re writes are specified by two blocks of pixels-if the “left-hand side” matches 
an area in the grid, it can be replaced with the block given as the “right-hand 
side” . A computational step here is the application of one of these rules to a place 
on the grid. Since multiple rules may apply to a given grid (and rules may ap- 
ply in multiple places), a choice mechanism is used to select the “best” rewrite, 
which is done, then the process continues with the new grid. A simple example 
of such computation is illustrated in Figure 6. There are a number of interesting 
shape manipulations possible-reshaping areas, moving areas, changing topology, 
and so forth. There are certain limitations inherent in this approach, notably it 
is only defined for regular grids of pixels. Like IDR, there is no explicit notion 
of connectivity (other than that imposed by the blocks in the rules) . 

Other researchers have looked at similar pixel rewrite systems, notably Ya- 
mamoto [28] who allowed programs to have more high-level structure. Cellular 
automata ([29], e.g.) are similar in that they have defined rewrite rules for pixels, 
but are more restrictive in how a rule may be specified (a rewrite is based only 
on the value of a pixel and its immediate neighbors). 

8 Conclusions 

In this paper we looked at inter-diagrammatic reasoning as a possible logical 
and computational framework for digital geometry problems, particularly those 
using digital pictures. Its features are generally consistent with digital geometry 
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in the plane; other than its bounded nature, it less restrictive than most in 
terms of how pixels can partition the plane. While it does not explicitly define 
the connectivity within a diagram, we showed how the adjacency relation 77 can 
be modelled by a sequence of diagrams. 

We defined the digital analogue of a simple planar region, based on the no- 
tion of a Jordan surface [3]. Using this, we showed how IDR could be used to 
determine the RCC / I9 r relations between two simple digital regions. The spec- 
ification of the IDR algorithms was quite compact; given an implementation of 
the IDR primitives, the algorithms would be extremely easy to implement. These 
implementations were extremely efficient for those relations that did not require 
using the adjacency information (those that did not look at tangential border 
intersection). While it was easy to specify the diagram sequence that encoded 
the adjacency information, using these diagrams (with only two non- WHITE 
pixels) meant that we dealt with very little information at each computational 
step, and did not gain much from the inherent parallelism of the operators. 

This work raises a number of questions for future study. Is it possible to 
examine more information in parallel-specifically, is there a way to encode adja- 
cency information effectively, in a constant number of diagrams for example, so 
we could avoid using the relatively long sequences of diagrams to get boundary 
information? Are there other programming/control structures that could extend 
IDR in useful ways without making it too difficult to implement efficiently? 
Is there a reasonable way to encode RCC/I9 r relations so that we could use 
a PRS (like Furnas) to solve these problems, and then compare the algorithmic 
efficiencies? 

IDR provides powerful and general operators for working in digital geometry. 
Overall, the strength of IDR as a logical and computational framework is its abil- 
ity to deal with whole diagrams (or multiple diagrams) at a time, which leads 
to a very compact specifications (and efficient implementations) where prob- 
lems can easily be decomposed into individual pixel computations. Conversely, 
however, extracing inter-pixel information from a single diagram may be more 
difficult if such decomposition is not possible. It is important, therefore, to find 
efficient encodings of the appropriate inter-pixel information (as we did here, 
for example, with the RCC/I9 r boundaries) so that the inherent parallelism of 
operating on the whole diagram is utilized. 
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Abstract. Isaac is a rule-based visual language for mobile robots using 
evidential reasoning and a fuzzy inference engine. A prototype inference 
engine for Isaac has been implemented, permitting experiments with the 
Isaac language. This paper discusses this inference engine, describes some 
preliminary experiences with programming Isaac rulesets, and proposes 
future optimizations and enhancements to the inference engine. 



1 Introduction 

Isaac is a project developing a rule based visual language for geometric reasoning, 
specifically for use in mobile robots. Some of its features include: 

1. Visual representation of geometric concepts 

2. Explicit representation of uncertain knowledge regarding the robot’s sur- 
roundings 

3. Evidential reasoning for model updating 

4. Consistent handling of input, output, and inference rules[10] 

In this system, all of the concepts of the language are geometric, the goal being 
to determine the extent to which a purely geometric formulation can be used to 
specify a reactive robot control environment. The model being manipulated by 
the ruleset is geometric; the rules themselves are activated by intersections of 
geometric objects in the rules with objects in the model. 

In this paper, we discuss recent work on Isaac’s implementation. The par- 
ticular focus of the paper is on the development of a prototype inference engine 
interpreting Isaac rulesets, and its use in a simulator allowing testing of rulesets. 

In the following two sections, we will briefly review Isaac’s model and rules. 
Following this Introduction, section 2 will describe the inference engine proper. 
Section 3 describes the system’s user interface. Finally, Sections 4 and 5 provide 
a discussion of our experiences to date with the system, and suggest avenues for 
future research. 
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(a) Maze and Robot 




(b) Obstacles and Trace 




(c) Combined Maze and 
Map 



Fig. 1. Example isaac model 



1.1 Environment Model 

Isaac’s environment model is a set of planes representing features of interest. 
A separate plane is used for each type of feature; for instance, obstacles to the 
robot’s progress or a path to be followed. Points in the planes are assigned 
Dempster-Shafer belief values, maintaining information as to the system’s belief 
in the presence or absence of features at that point. This is represented in this 
paper as ( b , d) where b is the degree of belief in the feature at that point and 
d is the degree of disbelief. The sum of b and d at a point must not exceed 1; 
ignorance regarding the presence or absence of a feature is represented by a sum 
of less than 1 [12]. Model planes are processed independently; this both provides 
a simpler semantics for a programmer, and also avoids a combinatorial explosion 
in the number of subsets of planes which would need to be maintained by the 
belief functions. 

When the planes are defined, the programmer is able to set an initial belief 
function for each plane. Typically, a plane that will be used to represent obstacles 
or other features in the arena will be given a reset belief function of (0, 0) (no 
information at all regarding presence or absence of obstacles), while planes being 
used to manage information regarding progress made by the robot (for instance, 
passageways which have been explored) will be given a reset belief function of 
(0,1) (the system is certain that nothing has been labelled by the robot) . 

An example of an Isaac environment model, generated in the course of solving 
a maze, is shown in Figure 1. In Figure 1(a), the arena being being explored is 
shown as a grey path on a black background (this map is taken from a maze in the 
Iveagh Gardens in Dublin, Ireland[3]). The robot is represented as a black circle 
with a white wedge showing the current direction of travel. Figure 1(b) shows 
the model which has been developed to this point in the robot’s exploration. 
Three planes are being used for this model: in the on-line version of this paper 
the sides of the maze which have been discovered at this point are shown in 
red, while the line up the middle of the path is shown in blue 1 . In addition, 



1 For the remainder of the paper, all references to color will refer to the on-line version. 
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mark where I’ve been in blue 




X X 










(a) Rule Marking First Maze Traversal 


mark where I’ve been twice in green 












x x 






^ 





(b) Rule Marking Second Maze Traversal 



Fig. 2. Isaac rules for path marking 



the dead end which the robot has explored and back-tracked is shown in green. 
Saturation is being used to display confidence in the presence of obstacles with 
brightness being used to display confidence in their absence, as described in [9]. 
In this partial map, this results in the path itself being shown as white (as there 
is a very high confidence that there are no obstacles in the path) and the edges of 
the path being shown in red. The blue plane is being used to trace the progress 
of the robot, resulting in the line proceeding up the center of the path. 

1.2 Rules 

Isaac rules take the customary form of a left hand side containing predicates to 
be satisfied and a right hand side containing assertions to be made as a result 
of rule firing. In Isaac’s case, the predicates take the form of geometric objects 
to be matched against the model, and the assertions take the form of polygons 
to be entered into the model. Two typical rules in Isaac are shown in Figure 2. 

The rules are represented by two figures: on the left hand side is a geometric 
object to be matched in order to activate the rule, while the polygons to be 
inserted in the model appear on the right. These rules are used to mark paths 
which have been explored by the robot: the rule in Figure 2(a) has no left 
hand side (the empty octagon represents the robot itself; this is always shown 
to provide a reference for the development of the rules), so the rule is always 
activated. Its right hand side marks the locations where the robot has been; it 
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basic wall-following rule: if I’m too far from the wall, slow down the near motor 











Fig. 3. Wall following rule 



map forward (sensor 2) 
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Fig. 4. Forward sensor rule 



is responsible for the blue line in the map shown in Figures 1(b) and 1(c). The 
rule in Figure 2(b) identifies a path which is being explored for the second time, 
and leaves a green trail to mark the second traversal of the path. This enables 
the use of Tremaux’s Rule to solve island mazes [5]. 

Rules can also be used to control actuators. Figure 3 shows the rule used 
for following the left wall of a maze. The left hand side of this rule matches the 
absence of features: the empty green triangle at the front of the robot establishes 
that this is not the second time the path has been followed, and the empty red 
triangle to the left of the robot establishes that there is no obstacle at the given 
distance to the robot’s left. If both of these triangles are matched by the lack of 
a green line and the lack of a maze wall respectively, the rule is activated. The 
grey triangle in the rule’s right hand side labelled mO (for motor 0) controls the 
robot’s left motor. The triangle’s position and saturation control the extent to 
which the motor is enabled; the effect of this rule is to attempt to reverse the 
left motor, turning the robot to the left. 

Finally, it is possible to define rules which are parameterized by sensor inputs. 
Figure 4 shows the rule which is used to process the forward sonar sensor. The 
arrows labelled s2 are used to represent the input from sonar sensor 2, which 
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faces directly forward. In general, an arrow can be placed arbitrarily in a rule 
right hand side; its origin and orientation are fixed in relation to the robot 
by its position in the rule, while its length is determined by the actual reading 
(transformed appropriately as described in [LI] to reflect geometric values rather 
than a raw input) . A point placed along (or beyond) an arrow provides a scaling 
factor. 

The result of this rule is to insert two triangles into the model: a triangle ex- 
tending from the center of the robot platform to the sensor return range marking 
an area where no obstacles are present, and a triangle at the sonar return range 
marking the presence of an obstacle. The triangles’ widths are 25% of the sensor 
return value. 

In addition to the model manipulation rules described here, the full Isaac sys- 
tem also includes navigation rules for moving the robot’s position in a model [8]. 
In order to experiment with model manipulation, the prototype inference en- 
gine does not process these rules. Instead, the simulation maintains the correct 
position and orientation of the robot. 

2 Inference Engine 

The inference engine operates on the rules in two phases: first, the degree of 
activation of each rule is determined, and second, the model is updated to reflect 
the firing of the rules. This results in a conceptually parallel execution of the 
rules. In this section, we discuss the inference engine itself. 



2.1 Activation 

In order to determine the activation of a rule, all of its patterns are matched 
against their corresponding model planes. The pattern is transformed according 
to the position of the robot in the model, after which the model points are 
examined for either the presence (in the case of a filled triangle) or the absence 
(in the case of an empty triangle) of an object. For a filled triangle, the degree of 
the match is determined by the maximum model belief value in the intersection; 
for an empty triangle, the degree of the match is determined by the minimum 
model disbelief value. The intent is that for a filled triangle, we have a good 
match if an object is present anywhere in the triangle; for an empty triangle, we 
only have a good match if we have good reason to believe that there is no object 
present anywhere in the triangle. 

A rule is activated to the extent of the worst match of all its patterns. A rule 
that has no left hand side is always fully activated. 

2.2 Firing 

After the activation of all the rules has been determined, they are all fired to the 
extent that they are activated, and all of the polygons on the rule’s right hand 
side are entered into the model. 
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When a rule is fired, the objects making up its right hand side are trans- 
formed according to the robot’s position in the environment model. Each right 
hand side polygon’s belief values are multiplied by the rule’s activation, and the 
model is updated using Dempster’s Rule of Combination. Note that while this 
is described as “entering a polygon in the model,” the result of the updating 
may be increasing confidence in an object’s presence, increasing confidence in 
its absence, or decreasing confidence in either or both. As Dempster’s Rule is 
commutative and associative, the order of firing does not matter. 

2.3 Motors 

In the motor control rule from Figure 3, the left motor (mO) is assigned a triangle 
with vertices at (-4, -8) (-3, -7) (-3, -9), with a belief of 0.05. In processing motor 
control rules, only the degree of belief and the y-component of the center of area 
of the motor’s polygon(s) are considered; in this rule, the y-component is -8. 
There is also a default movement rule (not shown) with no left hand side (so 
it is always fully enabled) which assigns both motors a y-component of 8 and 
weight of 0.1. 

A motor’s speed is given by the weighted average of all of the rules that 
affect its speed. Consequently, if this rule were to be fired with an activation 
level of 1, the left motor would be set to a speed of 2.67 while the right motor 
would continue to have a speed of 8. The net result would be for the robot to 
turn toward the wall. For simulation purposes, motor speeds are adjusted so that 
a speed of “8” translates to a movement of one pixel in an execution cycle. 

3 Simulation User Interface 

A user interface has been developed controlling simulation execution and visu- 
alization, as shown in Figure 5. The interface allows the user to control speed 
of execution (the user is able to single-step execution, run the simulation slowly, 
or run it as quickly as the simulation, inference engine and user interface are 
capable of execution), reset the simulation to its initial state, or select a new 
location for the robot to begin. The user can select whether the robot and arena 
are visible, and independently enable or disable display of each model plane. 
Finally, the current speed of each motor is displayed. 

The planes are assigned arbitrary hues by the ruleset designer, for use in 
visualizing the model as the system is in operation. 

Each pixel of the model area is displayed using the visualization described 
in [9]. Briefly, the plane is assigned a hue by the user. Saturation and brightness 
are used to represent (b, d), with increasing saturation being used to repre- 
sent b and increasing saturation representing d. Figure 6, taken from [9], shows 
saturation and brightness as a function of belief and disbelief for the red plane. 

When multiple layers are displayed, a crude heuristic is used to select the 
color at each pixel in the display. Each visible layer is examined for degree of 
belief and, if the reset disbelief value of the plane is non-zero (see Section 1.1), 
disbelief. The pixel is then assigned a color according to the following rules: 
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Fig. 5. Isaac user interface 




Bel(F) 

Fig. 6. Belief visualization 



1. If the highest-belief displayed plane is greater than a threshold (the default 
threshold is 0.25), the pixel is visualized according to that plane. 

2. Otherwise, if the arena is being displayed and has an obstacle at that pixel, 
the pixel is displayed as black. 

3. Next, if there are any displayed planes with reset belief value other than ( 0 , 
1 ), the visualization of the highest-disbelief such plane is used for the pixel. 

4. If all else has failed, the pixel is displayed in the neutral grey used for ( 0 , 0 ). 

A well-known pitfall in the development of visual interfaces is the use of 
culturally-biased or perceptually difficult colors; the difficulty color-blind users 
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have in making use of interfaces using red and green is an example. In Isaac, 
the hues chosen for color planes are not semantically meaningful. Planes are 
assigned hues by the ruleset programmer, however, the hues can be changed at 
will by the user of the simulation by selecting the color palette buttons on the 
right side of the interface window. This doesn’t prevent the use of inappropriate 
hues (imagine a user selecting the same hue for two planes), but does at least 
provide a mechanism for the user to make better choices. 

4 Discussion 

Experimentation with a variety of arenas and rule sets has led to a number of 
observations regarding Isaac’s evidential geometric reasoning. 

First, it has proven possible to implement a number of standard reactive 
robotic tasks, such as obstacle avoidance, mapping, and maze solving. The use 
of levels of belief in the objects to be inserted, and especially in the motor con- 
trols, have led to a programming methodology which is quite similar to Brooks’s 
subsumption architecture [1]. The common methodology for creating a ruleset 
has been to begin by creating a basic rule for robot movement, and then create 
new rules with greater weights for responses to particular situations. The weights 
are then adjusted in response to the simulated robot’s performance in assorted 
arenas. 

Several intuitions regarding the development of rulesets have proven to be 
false. It was anticipated, for instance, that it would be most useful for objects 
appearing in rule left hand sides to be large, so it would be possible to identify 
objects in a large area. This approach led to rulesets that did more to expose 
“holes” in the robot’s sensor coverage than to solve their intended problems (the 
rule shown in Figures 3, with its very small area of non-obstacle in the left hand 
side, is an example of a rule developed following the development of this insight). 
For rules not based on maps directly derived from sensors the result has been 
exactly opposite, with very large regions in the rule left hand sides picking up 
marks left behind (as seen in Figures 2 and 3). 

5 Future Work 

The development of this inference engine, and our exprience with it, has led to 
a number of possibilities for future work. 

5.1 Rulesets 

At present, the inference engine only handles model manipulation rules, not 
navigation rules (the simulation simply maintains the robots position, and the 
correct position is used in rule evaluation). This has proved quite useful in ex- 
ploring the language itself, but these rules need to be added both for a more 
realistic simulation environment and to permit the system’s use in an actual 
mobile robot. 
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5.2 Data Structures and Algorithms 

The data structures and algorithms being used in the prototype are very simple, 
due to the interest in having the system operational in as short a time as possible: 
the model is implemented with an array of pixels for each plane, and updating 
objects in the model is performed by updating each pixel in the object. Also, 
every rule is evaluated on every processing cycle. 

A two dimensional geometric library using simple polygons directly as the 
geometric primitives is currently being implemented, with the intent that it will 
replace the array-based model currently in use. It is anticipated that this will ex- 
ploit spatial coherence to significantly reduce both space and time requirements. 

Algorithmic improvements are of greater interest. There are well-known al- 
gorithms, notably Rete[4] and TREAT [6], which act to evaluate rulesets more 
efficiently than the naive approach described here. Rete has been extended to 
fuzzy production systems [7] . We expect to exploit the new model representation 
to extend the the fuzzy Rete algorithm to a geometric fuzzy Rete. 

Another avenue for algorithmic improvements is parallelism. Isaac’s rule pro- 
cessing semantics were designed to be conceptually parallel; the activation level 
of all rules is calculated, and then all rules with a non-zero activation level are ap- 
plied using only commutative and associative operators. Consequently, a parallel 
implementation of the inference engine should be feasible. At the very least, it is 
possible to separate the processing by planes in a shared-memory environment. 



5.3 Visualizations 

The visualization which was developed for this prototype has proved quite useful 
in observing the behavior of the robot and the rules in exploring environments. 
At the same time, experience has shown some deficiencies which we are working 
to address. 

First, while the heuristic being used to visualize the arena, robot, and model 
is reasonably successful (especially with a user interface that enables display 
of the individual planes easily), there is no sense in which multiple planes with 
relevant information can displayed at a single pixel. We are investigating blending 
functions which might better display more of the information visible at a given 
location in the model. 

Another area in which we hope to augment the current global visualization is 
by adding a local, robot-centric visualization to better display the rules currently 
under consideration. A common debugging technique with the current visualiza- 
tion is to define a small triangle in an otherwise-unused model plane in the right 
hand side of a rule, so the degree of activation of the rule is left as a trail behind 
the robot. While this has proved useful, it would also be very helpful to display 
a large view of the robot itself in context of the model, and showing the polygons 
on the left hand sides of some of the rules with a saturation showing their degree 
of activation. 
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5.4 Integration 

Work has previously been done in the design and prototyping of the user interface 
for a visual editor for Isaac, but this has not progressed to the point of the visual 
editor being able to actually create ruleset files. Instead, the figures showing 
visual representations of the rules were generated with the xfig drawing tool, 
and translated by hand into the textual representation. This has actually been 
quite instructive, as attempts to shortcut the process by directly writing textual 
rules has shown a surprising level of difficulty in imagining the geometry of the 
objects in the rules. Further development of this editor, so creation of rulesets will 
be directly possible, is another area in which we are proceeding. It is interesting 
to speculate on the possible results of combining the robot-centric visualization 
described in Section 5.3 with the rule editor, so it could become possible to 
create new rules based on a situation present in the model. Another approach 
to this combination, focussing on a direct manipulation of the robot model, can 
be found in [2]. 
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Abstract. A scalechart is a set of statecharts, operating in a dense time 
domain, whose behavior is self-similar at different scales. The simplicity 
of extracting proofs of behavior from scalecharts is demonstrated, based 
on the < and co relationships. An algorithm is presented which automat- 
ically extracts < relationships from a simplified version of scalecharts. 



1 Introduction 

A scalechart [1] is a set of Harel-style statecharts [2] that operate in a dense (ever- 
divisible) time domain. Structurally each statechart of the set consists of a set of 
nodes and a set of transitions between those nodes, which form the vertices and 
arcs respectively of a directed graph. Each statechart has a distinguished start 
node. A node contains one or more orthogonal components or orthocomps , each 
of which may contain a statechart, forming a self-similar structure in space. 

Scalecharts are also self-similar in time. There are no generated events as- 
sociated with transitions in scalecharts; rather, generated signals are associated 
with nodes. Unlike events, signals have an extended duration in time. 

For example, in Fig. 1, the state D has a signal d associated with it. d remains 
asserted as long as D is active. D can only become inactive when transition E 
fires. E can only fire when signal k is asserted. 

Now suppose the scalechart of Fig. 1 is instantiated at time to. Then ac- 
cording to the semantics in [1], we can guarantee that at some time t\ > to 
the top-level scalechart will be in state A, orthocomp B will be in state D , and 
orthocomp C will be in state G. Because G has an enabled out-transition H , 
it is possible for C to transit to state I and similarly to state K , in which k is 
asserted, allowing B to transit to state F, which in turn allows C to transit back 
to state G. 

Using the < relation to mean “happened before” (as in [3]) and the co relation 
to mean “happens concurrently with” (as in [4]), and using the convention of 
expressing occurrence o of signal s as s Q , we can express this situation as 

9i < ki <32 A d\ < fi 

g i co d\ A di co ki 

h co /i A /i co 32 

A. Blackwell et al. (Eds.): Diagrams 2004, LNAI 2980, pp. 227—230, 2004. 
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Fig. 1 . Signalling inside a compound scalechart node 



Note that some of the inequalities are independent of the signal-occurrences, for 
example, within the orthocomp B it is clear that d\ < /i, and within orthocomp c 
we can say that g\ < k\ < <?2, and in general gi < hi < gi + 1 for all* > 0 . 



2 An Extraction Algorithm for Inequalities 

We assume that all nodes have a signal expression consisting of one signal. We 
also ignore input signals completely. This means that a transition is enabled 
whenever its from-state is active. 

The extraction algorithm works as follows: we first build the underlying di- 
rected graph from the statechart we are analyzing. We then identify the cycles 1 
and parallel trails 2 of this graph. 

Once we have discovered the cycles and parallel paths, it is a simple matter 
to mark nodes which are internal to parallel paths, nodes which are members of 
a cycle and nodes which are start/end nodes of a cycle. Once these nodes are 
identified, it is possible to generate the inequalities. 

We can discover all the parallel trails and cycles of a rooted directed graph by 
growing all the possible trails from the root. We start with the trail consisting of 
the root vertex only (this is a trivial trail) . We then add each outgoing arc from 
the root and its to- vertex, to form a new trail. We can continue this process with 
the end-vertex of each new trail, until we cannot find an outgoing arc that we can 
add, without breaking the rule that a trail does not have duplicate arcs. As the 
directed graph has a finite number of arcs, this process is bound to terminate. 

1 A cycle is a circuit of the graph which contains no other circuit. 

2 A trail is a walk through the graph which contains no repeated arcs, though it may 
contain repeated nodes. 
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Finding the Cycles: We grow all the trails we can, starting from the root. 
When a trail grows to a vertex it has seen before, it contains a cycle. We therefore 
mark this trail as mature, and stop it growing any more. We continue this way 
until all our trails have stopped growing (either because they have matured, or 
because they have run out of arcs) . Then we extract the cycle from each mature 
trail. 



Finding the Parallel Trails: We define two trails as being parallel if: 

— they are both non-trivial 

— they have the same start vertex and the same end vertex 

— they have no other shared arc or vertex 

— they do not contain any circuits at their start or end vertices (unless the start 
and end vertices are identical). They may contain circuits at other vertices. 

We grow all possible trials from the root as before. When we find two distinct 
trails which share an end vertex, we have two candidate parallel trails. Because 
we are allowing cycles in trails, we cannot assume that our two candidates are 
genuinely parallel: both, one or neither of the trails may have a cycle on its end 
vertex. We discard any pair where only one trail has a cycle on its end vertex. 

We now remove the common initial sub-trail of the two trails (this may 
be null), giving two non-trivial trails with the same start vertex and the same 
end vertex. Then we reject any pairs which have common internal nodes or 
transitions. Finally, for trails with distinct start and end vertices, we check that 
there are no cycles on the start vertex or the end vertex of our pair. 

3 Conclusions 

We would like to extend the algorithm informally presented here to include 
transitions with signal labels and guards. Nevertheless the current algorithm 
shows that it is possible to extract proofs semi-automatically from scalecharts. 
It is my intention to incorporate this work into a scalechart drawing tool, which 
would check out designs as they are drawn, and correct errors “on the fly”. 
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Abstract. Diagram schemas allow us to organize broad and deep collections of 
diagrams for theoretical analysis and system building. Formal intensional (rule- 
based) methods attempt to lay out the necessary and sufficient conditions for 
describing a class. For diagrams, it can be argued that the most useful schemas 
are extensional ones built around exemplars and prototypes based on central 
instances. Diagrams are then further classified in relation to these central 
instances. Many existing system designs use exemplar-based schemas 
implicitly. We describe how diagram exemplars and prototypes can be used in 
the design of a wide variety of systems for generating, parsing, machine 
learning, indexing and retrieval systems and contrast exemplars with rule-based 
approaches. Exemplar-based approaches are particularly well-suited for 
dealing with HCI issues. 



1 What Types of Schema Are There? 

To deal with diagrams in all their generality, it is important to design schemas that 
describe diagram structure, help to classify diagrams and aid in building systems. We 
describe the differences between rule-based (intensional) schemas and instance -based 
(extensional) schemas, emphasizing the latter. We discuss the use of schemas, 
including examples from our own research. 

Rule-based schemas are typically based on diagram components and their 
attributes, but can involve captions. For data graphs (Figures 1 and 2), the 
components can include tick marks or data points at the lowest level and labeled scale 
lines at a higher level. A diagram parser is a constructive rule-based classifier 
producing a tree-based structural representation [4]. At a higher level, one type of 
diagram can be compared to another using additional rules, e.g., comparing a line 
chart to a bar chart [5]. Machine learning systems can produce rule-based classifiers 
in the form of decision trees based on vectors of attribute values that describe 
instances [7]. 

Instance -based schemas are built around exemplars and prototypes [6]. A 
diagram exemplar is based on a collection of remembered or recorded instances that 
typify a particular class, such as the collection of choices of data plot styles presented 
in a spreadsheet application. A prototype is a particular instance that represents a 
class, as might be seen in a visual dictionary!!]. Case-based reasoning and memory- 
based natural language processing are examples of the use of extensional approaches. 
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An ontology is an explicit rule-based specification of a conceptualization. The 
construction of ontologies can be difficult, because of the problems of settling on 
(sub)category names, and the fact that the same set of entities can be described from 
different points of view depending on needs and uses [8]. Similarly, the choices for 
exemplars and prototypes can depend on context and use. 

The world of diagrams is huge (see http://www.diagrams.org); the focus of this 
paper is on the diagrams that populate the scientific and technical research literature, 
e.g, data graphs, system block diagrams, gene diagrams and even images with 
overlaid annotations. 



2 Why Are Schemas Needed? 

The need for rule-based schemas is obvious. But attempting to apply classification 
rules to collections of objects can be fraught with difficulty. These difficulties led 
Wittgenstein to claim that subjective "family resemblances" are often the best we can 
do (#66 in [9]). The reasons for using extensional schema are particularly compelling 
for diagrams because their spatial organization and structure is complex and difficult 
to analyze by rule-based methods. 

But humans have few problems in understanding diagram instances and relating 
various diagrams to one another. Users of interactive systems can function most 
efficiently when they are presented with diagram instances, not lengthy attribute- 
based descriptions^ ). So it makes sense to structure the underlying systems, data 
structures and indexes in ways that mirror human visual cognition. 

In diagram parsing it is useful to target grammars to exemplars and then deal 
with deviations and elaborations as additional issues. In designing diagram indexing 
and retrieval systems we can pick exemplars of each category to present to users of 
the systems. Diagram drawing and generation applications are often built around 
exemplars and then elaborate them. In a drawing application, a user could draw an 
arrowhead manually, but most will choose an arrowhead from among the exemplars 
which are built into the application. Such systems normally preselect one of these as 
the default (the prototype). When data is drawn from myriad sources in various 
styles, e.g., in diagram summarization [3], it is best to present it in a familiar default 
style, most rapidly assimilated by the user. In machine learning, an appropriate 
choice of examples, for both supervised and unsupervised scenarios, would be central 
examples. In a certain sense, each central example is maximally separated from 
examples in other classes; this can assist both machine learning and human learning. 

3 How Are Schemas Used? 

Before we can build large and comprehensive diagram analysis systems, we must deal 
with the fact that the overwhelming number of diagrams published electronically are 
in raster format, e.g., GIFs and JPEGs. These need to be converted to vector format 
(vectorized) for analysis. In designing this software, we have focused on prototypical 
structures met at this level such as T-junctions, comers, occluding data points and the 
convention of including large sets of similar elements, e.g,. identical-length tick 
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marks, identical font styles in multiple labels, etc. (Crispell and Futrelle, in 
preparation). 

Once vectorized, diagrams can be parsed. In our diagram parsing systems [2,4], 
the grammars were constructed manually, starting with ones that describe the simplest 
central examples. A typical rule in such a grammar for a data graph such as Figure 
lb or lc is: 

X-Axis -> X-Axis-Line X-Ticks X-Labels X-An notation X-Text 

It is clear from this example that the schema represented in the rule straddles the line 
between syntax and semantics. This adds power to the approach but has the 
disadvantage that a large number of specific grammars may be required. 

In diagram summarization [3], care must be exercised. A summarization should 
adopt the schema that the original author used, else the reader could be confused by 
the visual incongruities met when referring back to the original diagrams (Sec. 2.2 in 
[3].). This is true even if the original is not designed using a style that is central in the 
field. 

In summary, the use of the extensional approach is not in the execution of the 
system. Rather it is used by system designers at a meta-level to decide what types of 
rules to build into functioning systems that work with diagrams. Direct presentation 
of diagram instances to users is another major use of exemplars. 
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Fig. 1. Instances of two diagram classes, (a) a bar graph, and (b,c) x,y data graphs using lines 
connecting data points (from Nature 405, pg 904 (2000)). These both have an uncomplicated 
style that qualifies them as useful prototypes in an extensional classification system. Elements 
of these figures such as the error bars in a are standard and are therefore quickly and correctly 
recognized as such by any literate scientist/reader. Similarly, conventions such as having 
external tick marks with aligned numerical labels are also standard and easily understood. 
Using numerical attributes such as the presence of tick marks, rectangular regions, etc., we 
have built rule-based supervised machine learning systems that can accurately distinguish 
graphs similar to a from those similar to b and c [5] 
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Fig. 2. A complex non-central diagram that is nonetheless closely related to Fig. l.b and lc 
(part A of a figure in Tokumoto, et al, BMC Biochemistry, 4 (2003)). All published scientific 
diagrams rely on their captions, but this one even more so, because it is otherwise impossible to 
tell which y-scale label is associated with which data lines and points. Non-central examples 
require significantly more effort on the reader's part to understand. They also require more 
complex grammars for parsing 
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Abstract. Knowledge discovery is one of the humans' most creative activities. 
As is often pointed out, humans often draw figures in the process of reasoning 
and problem solving. Diagrammatic information from a figure gives us 
numerical and relational data for reasoning, and it also gives us cues for 
controlling reasoning. The author has developed four discovery systems in the 
domain of plane geometry. These systems automatically draw several forms of 
figures and observe the figures in order to acquire geometrical data. The data 
are used for extracting numerical expressions needed for discovering theorems, 
controlling discovery processes in order to avoid combinatorial explosion, and 
evaluating the usefulness of generated numerical expressions. As an approach 
of automated scientific discovery, this paper discusses the roles of 
diagrammatic information in the process of knowledge discovery. 

Keywords: Automated scientific discovery, discovery systems, diagrammatic 
reasoning, geometrical theorems. 



1 Introduction 

Knowledge discovery is one of the humans' most creative activities. There are two 
main approaches for machine discovery: data mining and automated scientific 
discovery. Although the former is intensively studied by many researchers, the latter 
is also important for clarifying the process of humans' intelligent activities. 

As the research for the latter approach, the author has developed four discovery 
systems in the domain of plane geometry. These systems automatically draw several 
forms of figures and observe the figures in order to acquire geometrical data. The data 
are used for extracting numerical expressions needed for discovering theorems, 
controlling discovery processes in order to avoid combinatorial explosion, and 
judging the usefulness of generated numerical expressions. This paper discusses the 
roles of diagrammatic information in the process of knowledge discovery. 
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2 Discovery Systems for Geometrical Theorems 

Discovery systems that the author has developed are briefly explained below: 

• DST[1]: The system draws figures by adding lines on a triangle and by giving 
constraints to sides or angles. By observing the figures, relation of sides and 
angles are extracted in the form of numerical expressions. Well-known basic 
geometrical theorems such as Pythagorean theorem are rediscovered as the 
result of transformation of observed numerical expressions. 

• PLANET: The system accepts an input figure and observes similarity or 
congruence of triangles in the figure. Numerical expressions extracted from the 
figure are then combined to generate new expressions by substitutions. 
Expressions of closely located sides or angles are regarded as theorems. The 
system rediscovers theorems such as Menelaus's theorem and addition theorem 
of trigonometric functions. 

• EXPEDITION^]: The system draws figures by adding lines on a circle. 
Numerical values of the length of sides or measure of angles are observed from 
the figures. Theorems such as Power theorems and Thales' theorem are 
rediscovered inductively from the numerical values. 

• DIGEST[3]: The system accepts an input figure and observes the relations of 
areas of triangles in the figure. Numerical expressions extracted from the figure 
are then combined to generate new expressions by substitutions. Expressions 
that do not contain terms of areas represent relation of sides, which are regarded 
as theorems. The system rediscovers theorems such as Ceva's theorem or 
Menelaus's theorem. 

As an example of above discovery systems, explanation of EXPEDITION is 
described below. The system draws figures by adding lines on a circle as shown in 
Fig. 1. By observing each figure, numerical data such as the length of line segments 
and the measure of angles are acquired. From two entries of approximately equal 
numerical values, a candidate for theorem is obtained as shown in Fig. 2. 




Fig. 1. Generation of Fig. s by drawing lines on a circle 
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AB=1L.0 
BC = 7.5 
AC = 18.5 
AD = 8.3 
DE = 6.2 
AE = 14.5 



AB*AB =121.0 
AB*BC = 82.5 
AB*AD = 91 J 

AD*AE= 120.4 



Fig. 2. A candidate for theorem 




Fig. 3. Figures for rediscovering theorems 



Fig. 3. shows figures that EXPEDITION actually generates. Many well-known 
geometrical theorems are rediscovered from these figures, such as Power theorem 
(AB*AC=AH 2 ) from figure (5), and Thales' theorem (angle ACB = 90°) from figure 
( 6 ). 

3 Roles of Diagrams for the Discovery of Geometrical Theorems 

In the case of the above discovery systems, processing of diagrammatic information 
can be divided into two: 1) generation of figures by drawing lines or giving 
constraints, and 2) observing data from them. The latter data are divided further: 2a) 
data that represent relations of sides or angles that are basis of theorem discovery, 2b) 
data for evaluating the usefulness of generated numerical expressions, and 2c) data 
for controlling transformation of numerical expressions. 

In case 1, the systems make use of emergent property for the acquisition of 
diagrammatic data. Fig.s often clarify relations that are not explicitly given. By 
observing a figure that is drawn based on a few explicit constraints, much data about 
implicit relations are also obtained. In case 2a, data obtained from diagrams are both 
relational (equations among sides or angles) and numerical (numerical data of sides or 
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angles). The former data is used for deductive reasoning (transformation by 
substitution), and the latter data is used for inductive reasoning (verification of 
theorematic hypothesis). In case 2b, the systems make use of emergent property as the 
criteria for discovery. Evaluating the usefulness of numerical expressions is not an 
easy task in general. In the above systems, numerical expressions about closely 
located sides or angles are regarded as useful theorems. The systems utilize figures 
for both data for discovering theorems, and criteria for evaluating them. There is 
another criterion for evaluating usefulness based on syntactic form of numerical 
expression, such as symmetry or simplicity. Combination of both criteria (one based 
on diagrammatic information and the other based on syntactic form) is one of the 
most interesting and challenging topic, which are not yet tackled in the above 
systems. 

Another important role of diagrams is to control the direction of reasoning process 
(case 2c). It is often pointed out that humans often take the strategy of choosing 
objects that are closely located to current viewpoint as the next target for reasoning. 
The strategy is based on an assumption that related objects are often closely located in 
a figure. In the above discovery systems, DST utilizes diagrammatic information as 
the strategy for transforming numerical expressions; the system tries to eliminate 
terms that are newly generated by drawing additional lines. Resultant expressions are 
about sides and angles that exist before drawing additional lines. 

4 Concluding Remark 

This paper briefly discusses the roles of diagrammatic information based on the 
development of discovery systems for geometrical theorems. Discovery of 
geometrical theorem is one of the most intelligent activities, and it requires reasoning 
using both numerical expressions and diagrammatic information. Development of the 
above systems is expected to help the understanding of humans' diagrammatic 
reasoning processes. Discussion of this paper is the first step for clarifying the roles 
of diagrammatic information for humans' intelligent activities. 
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Abstract. Imprecise diagrams (those with malformed, missing, or ex- 
traneous features) occur in many situations. We propose a five-stage ar- 
chitecture for interpreting such diagrams and have implemented a tool, 
within this architecture, for automatically grading answers to exami- 
nation questions. The approach is based on identifying (possibly mal- 
formed) minimal meaningful units and interpreting them to yield a mean- 
ingful result. Early indications are that the tool’s performance is similar 
to that of human markers. 



1 Introduction 

An imprecise diagram is one where required features are either malformed or 
missing, or extraneous features are included. Imprecise diagrams frequently occur 
in student answers to examination questions. The need to handle such diagrams 
has arisen from our investigations of the automatic grading of free-form text 
answers [5]. 

The automatic grading of answers in a textual form has received much at- 
tention [2], Our approach is similar in that we currently do not attempt to 
address any higher-order semantic structures, which is equivalent to looking for 
key words or phrases in a sentential answer. 

Much of the activity in diagrammatic reasoning has been directed towards 
precise diagrams, such as the use of diagrams in mathematical proofs [4] and 
visual query interfaces to GISs [1, 3]. One diagrammatic grading system is DAT- 
sys [6], a diagrammatic front end to the CourseMaster marking system. DATsys 
provides a method for creating bespoke diagram editors, but does little to ad- 
dress how those diagrams are graded. In contrast, our work concentrates on the 
grading of possibly ill-formed or inaccurate diagrams. 

2 Our Approach 

Our general approach to interpreting imprecise diagrams consists of five stages: 
segmentation, assimilation, identification, aggregation, and interpretation. Seg- 
mentation and assimilation together translate a raster-based input into a set of 
diagrammatic primitives, e.g. boxes and text. In the identification stage, domain 
knowledge is used to identify low-level, domain-specific features, the minimal 
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Draw a diagram showing how the data hazard inherent in the execution of the pair 
of instructions ADD R2,R3,R1 ; SUB R1,R5,R4 by a 4-stage pipeline, is overcome. 



Fig. 1. An example examination question 



meaningful units. These features are aggregated into higher-level, abstract fea- 
tures. Finally, the diagram is interpreted to produce meaningful results, such as 
a grade. 

We have implemented the identification and interpretation stages in a simple 
automatic grading tool. Since our initial interest is in these stages, we avoided 
the need for the segmentation and assimilation phases by using a drawing tool for 
input. Given that our previous work in automatic grading [5] has been successful 
without the recognition of abstract features, we decided to explore the problem, 
in the first instance, without aggregation. 

Malformed features are handled by an inference mechanism in the identifi- 
cation stage which identifies and repairs such features. Missing and extraneous 
features are dealt with in the interpretation phase: the former through the ap- 
propriate allocation of marks, while the latter are simply ignored. 

3 Identification and Interpretation 

The question shown in figure 1 was set in an online examination; its model 
solution is shown in figure 2. The significant features in this solution are the pair 
of four-stage pipelines and the “forward” link between them. 

In this domain, we have taken the minimal meaningful unit to be an asso- 
ciation, which is a pair of boxes connected by a link. Given this definition, we 
treat a diagram as a (possibly overlapping) set of associations. 

Under exam conditions, students often represented the ordering of the stages 
in a pipeline by the relative positioning of unconnected boxes (e.g. left-to-right, 
top-to-bottom) . These implicit associations were recognized in the identification 
stage by assuming that two adjacent boxes were associated. This limited form 
of identification was sufficient for the examples we have seen to date. 

Interpretation of the diagram was guided by a “marking scheme” , where each 
association in the solution has a mark allocated. Each association in the answer 
was compared to each association in the solution, to give a set of similarity 




Fig. 2. A drawing of two pipelines with a link between them 
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Table 1. Human vs. automatic grading 





Human Automatic 


Mean 

a 


2.78 2.73 

1.05 1.09 



measures. The mark for each comparison was calculated by scaling the allocated 
mark by the similarity measure. The final mark was found by summing the 
highest scaled marks. 

4 Preliminary Evaluation 

In an on-line mock exam, 13 students used a simple drawing tool to create their 
answers to the question posed in figure 1. The answers were graded by the tool 
and by four independent human markers. The results are given in table 1; the 
maximum mark for the question was 4.0. The Pearson correlation coefficient 
for these data is 0.75 which is significant for n = 13 at the p < 0.01 level 
(2-tailed), indicating that the automatic grader performs very similarly to the 
human markers. 

Despite the small sample, the results are sufficiently encouraging to suggest 
that this method of diagram grading is a feasible approach to the problem. 
Therefore, we intend to apply this method to more complex domains (such as E- 
R diagrams). We also need to test whether aggregation will be of real value, and 
to devise alternative techniques in the identification and interpretation stages. 
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Abstract. By building computational models, Larkin and Simon (1987) 
showed that the effects of locational indexing give an explanation of 
‘Why a diagam is (sometimes) worth ten thousand words', to quote the 
title of their seminal paper. This paper reports an experiment in which 
participants solved three versions of Larkin and Simon's simple pulley 
system problem with varying complexity. Participants used a 
diagrammatic, tabular or sentential representation, which had different 
degrees of spatial indexing of information. Solutions with the diagrams 
were up to six times easier than informationally equivalent sentential 
representations. Contrary to predictions derived from the idea of 
locational indexing, the tabular representation was not better overall 
than sentential representation and the proportional advantage of the 
diagrammatic representation over the others did not increase with 
problem complexity. This suggests that the advantage of diagrams goes 
beyond the effects that locational indexing has on the processes of 
searching for items of information and the recognition of applicable 
rules. A possible explanation resides in the specific problem solving 
strategies that the participants may have been using, which depended on 
the structure of the representations and the extent to which they 
supported solution path recognition and planning. 
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1 Introduction 

The nature of the representations used for any non-trivial task will greatly 
impact upon, or even substantially determine, the overall difficulty of doing 
task. This is now a well-established and accepted finding in Cognitive Science 
and allied areas, and building upon this, our understanding of the nature of 
representations is growing. In superficially different problems with the same 
underlying structure, the form representation used can make finding a solution 
take up to 16 times as long [6]. External representations provide benefits such 
as being able to off-load computation on to the representation [7], [9]. Using 
orthogonal visual dimensions for distinct dimensions of information in a 
graphical display can make problems solving easier by supporting the 
separation of those information dimensions in the mind [10]. Scientific 
discoveries are often made possible through the invention of new 
representations [1], [3]. Conceptual learning can be enhanced by alternative 
representations that more clearly reveal the underlying structure of scientific 
and mathematic topics, by deliberately using representational schemes that 
encode the underlying structure in a transparent fashion [2]. 

1.1 Why a Diagram Is (Sometimes) Worth 10,000 Words' 

Larkin and Simon's 1987 seminal paper [8], which has the above title, is 
arguably the seed around which much of this research area in cognitive 
science has crystallized. That paper provided critical insights into the 
potential benefits of diagrammatic representations over propositional or 
sentential representations. Larkin and Simon built computational (production 
system) models for two domains, simple pulley problems and geometry, which 
explained why diagrams are often computationally better than informationally 
equivalent sentential representations. In diagrams items of information that 
are likely to be processed at the same time in a problem solution are often 
found together spatially. This locational indexing yields benefits both in the 
search for information and the recognition of what rules to apply. Further, 
this reduces the need to laboriously match symbolic labels. 
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The starting point for the Larkin and Simon paper is the assumption that 
diagrams are often better than sentential representations and the relative 
advantage of diagrams needs to be explained. Given the importance of this 
paper, the consequences of that assumption being wrong would be substantial. 
It would undermine Larkin and Simon's claims and could draw into question 
the subsequent work built upon it. However, there has apparently been no 
direct empirical evaluation to demonstrate that the diagrams in the paper are, 
in reality, found by people to better for problem solving than sentential 
representations. This paper presents an experiment using simple pulley 
system problems, like those used in the Larkin and Simon paper, to 
investigate the relative difficulty of solution within different representations. 

Reassuringly, the experiment demonstrated that humans find using 
diagrams of the simple pulley systems problem substantially easier to solve 
than equivalent sentential representations. Of course, this finding is not really 
a surprise. Rather, the main aims of the experiment were actually to: (1) 
quantify the benefits of the diagram for the pulley problem over problems of 
varying complexity; and, (2) explore the role of locational indexing in more 
detail, by comparing the diagrammatic and sentential representations with a 
tabular representation that is mid way between the others in the extent to 
which it uses spatial location as a means to index information. 

1.2 Isomorphic Problems with or without Equivalent Rules 

Larkin and Simon define a representation as consisting of a format and a set 
of operators and heuristics [8]. The format is the graphical (or notational) 
structure of the display (or expressions). The operators are applied to change 
the components of the display or to change mental states encoding the 
display. Heuristics guide the selection of suitable operators to use. All three, 
format, operators and heuristics, are essential for a representation since 
together they determine the nature of the problem space experienced by the 
problem solver. For instance, given a set of numbered concentric circles, the 
representation and problem is very different if the user brings ‘Venn diagram' 



Why Diagrams Are (Sometimes) Six Times Easier than Words 245 



set theoretic operators or arithmetic operators for ‘target shooting' to the 
interpretation of the display. 

It has been shown that isomorphic problems can be over an order of 
magnitude more difficult under alternative representations. For example, 
Kotovsky, Hayes and Simon [6] used variants of the Tower of Hanoi problem, 
such as the monster-move and monster-change problems, which had different 
operators or rules. In the monster move problem globes were passed back and 
forth between the monsters in a manner similar to the transfer of disks 
between pegs in the original Tower of Hanoi problem. In the monster change 
problem the change in size of globes held by each monster was equivalent to 
the transfer of disks. Although the underlying logical structure of the 
problems was always the same, with equivalent size and structure of problem 
state spaces, the different rules, productions, required for the alternative 
versions of the problem imposed substantially different loads on working 
memory. This impacted substantially on the overall difficulty of the tasks. In 
the move version of problem the image of the location of the globes/disks 
provided strong visual cues about what moves are legal to make, whereas in 
the change version of the problem logical mental inferences were required to 
work out permitted legal states. 




Fig. 1 . Diagrammatic representation for the medium complexity pulley system 
problem used in the experiment. The alphanumeric symbols are not present in the 
experimental stimulus (except ‘4' and '?') but are present to enable readers to cross 
reference components to the tabular and sentential representations Figs. 2 and 3 
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In the Tower of Hanoi studies the problems were isomorphic but the 
representations were different as the rules were experimentally manipulated. 
This raises the question of how much the difficulty of problems may differ 
when format of the display varies but the rules remain the same. The present 
experiment examines alternative representations of simple pulley problems to 
quantify the relative impact of different representations with equivalent rules 
but under alternative formats. The formats are diagrams, tables and 
sentential representations. Problems of three complexity levels were designed. 
Figs. 1, 2 and 3 show examples of a diagram, a tabular and a sentential 
representation as used in the experiment. In the problems, the participants 
are given the value of one of the weights and they compute the value of 
another unknown weight, assuming that the system is in static equilibrium. 
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Fig. 2. Table representation (medium complexity) as used in the experiment. Columns 
co-reference components in the rows, as indicated by ‘x' symbols. Cells with ‘[ ]' 
indicate components are part of the same assembly and the value is to be completed 
by problem solver. Final value to be found indicated by “?'. Arrows are explained in 
the discussion 



Why Diagrams Are (Sometimes) Six Times Easier than Words 247 




RffTanchored to the flooK. / 

On the otherside of the pulley s P3 the rope, Rg, support a weight W2. 



Fig. 3. Sentential representation for the medium complexity pulley system. 
Emboldened items and lines are not shown in the experimental stimulus and are 
explained in the text 

Fig. 4. lists the common rules for solving the problems. The four rules are 
used to compute incrementally the tensions in the strings connecting the 
weights, pulleys, ceiling and floor. For the problem in Figs. 1, 2 and 3, the 
minimum set of rule applications is: (1) PR1 to find the tension in rope Rb; 
(2) PR2 to find the tension in rope Rc, (3) PR3 to find the tension in rope Re; 
(4) PR3 to find the tension in rope Rg; (5) PR1 to find the value of weight 
W2. Production rule PR4 is not needed in this problem, as there is no weight 
hanging from two ropes. 



PR1. A single rope supporting a weight will take the value of that weight as long as 
no other rope is supporting it. 

PR2. In a pulley system the two ropes over, or under, the same pulley must have the 
same value. 

PR3. The pulley takes the value of the sum of the two ropes over or under it. This 
same value is then taken by a rope supporting it or that hangs from it. 

PR4. When a weight is supported by two ropes, its value is equal to the sum of the 



value of those two ropes. 



Fig. 4. Common set of production rules for solving pulley problems 
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Consider step 3 in more detail to contrast how the rule PR3 is interpreted 
in each representation. The tension in rope Rc is known, either because it has 
just been computed or because a value is written for it in the representation 
(‘4' if the solution is correct so far). With the diagram, Fig. 1., we see that Rc 
is connected to the centre of pulley P2 and see there are two ropes Rd and Re 
hanging from the sides of P2, hence the tension in each rope is 2, half that of 
Rc. With the table, Fig. 2, row 14 would have ‘4' written in the Hangs-Rc 
cell so one would repeat this value for pulley P2 and for P2 in row 15. Then 
‘2' would be entered in each of the cells for Rd and Re, and this value 
repeated in row 21 for Re. In the sentential representation, Fig. 3, it is a 
matter of finding the statement about what hangs from Rc, Pulley P3, then 
searching for the statements about which ropes hang from P3, and then 
writing the value beside each one. 

1.3 Locational Indexing in Diagrams, Tables and Sentential Representations 

In addition to the diagrammatic and sentential representations that Larkin 
and Simon considered, the experiment uses a tabular representation in order 
to investigate why the problem difficulty varies with alternative 
representations. According to Larkin and Simon, diagrams are often better 
than sentential representations because items of information that are needed 
for inferences are often spatially co-located. Such location indexing facilitates 
the finding of relevant information and the picking of appropriate rules to 
apply. It is instructive to try comparing how easy (a) it is to find all the 
propositions in Fig. 3 that refer to components that are part of pulley system 
P2 with (b) identifying the same components in Fig. 1. Similarly, if one 
knows only the value of rope Rc, deciding which rule to apply given the 
sentential representation is less obvious than in the diagram. 

The tabular representation was designed to exploit some of the benefits of 
location indexing, but rather than using co-location as in the diagrams, table 
rows and columns groups items of related information together more 
systematically than in the sentential representations, but without necessarily 
having them in close proximity to each other, nor necessarily separating them 
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from unrelated items. Presumably, searching for information is easier in the 
table than the sentential representation and knowing which rules are 
applicable is facilitated by the patterns of cells so far completed. Hence, it 
may be predicted that the ease of problem solving with the tabular 
representation should be more similar to the diagram than the sentential 
representation. 

Knowing which rules to apply is an important aspect of this task, so 
different problems were designed for the experiment that required different 
numbers of rule types, but about the same minimum number of rule 
applications to solve. Fig. 5a and 5b show, respectively, the simple and 
complex forms of the problem as diagrams. Fig. 1. shows the medium 
complexity version. Note that the number of pulleys, ropes and weights is the 
same for each problem. The complex problem requires all four rules in Fig. 4, 
the medium complexity problem does not require rule P4, and the simplest 
problem does not require either P3 or P4. It might be expected that the 
greater the complexity of a problem the harder it would be to solve and that 
the relative proportional benefit of the diagrams and the tables over the 
sentential representations will increase with increasing complexity. 





Fig. 5. Diagrammatic representations of (a) the simple and (b) the complex variants 
of the pulley system problem. See Fig. 1 . for the medium complexity problem 
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2 Experiment 

The first aim of the experiment was to determine how much harder people 
find solving pulley system problems with the sentential representation 
compared to the diagrammatic representation. The second aim was to test 
the hypotheses derived from explanations of the differences between 
alternative representations based on locational indexing. It was predicted 
that: (1) the tabular representation would be more like the diagrams than the 
sentential representations; (2) with increasing complexity of the problem the 
benefits of the diagrams and table over the sentential representation would 
interact with complexity, showing a magnified proportional advantage. 

2.1 Design and Participants 

The independent variables of the experiment were representation format and 
the problem complexity. A mixed 3X3 design was used with representation 
format (diagram, table, sentence) being a between participants factor and 
complexity (simple, medium, complex) serving as a within participants factor. 
The independent variable was solution time. Success rate was not used as a 
dependent variable, because performance was at ceiling in all nine conditions, 
as expected. 

The 36 participants were volunteer non-engineering undergraduate students 
from the University of Nottingham. They were randomly allocated to one of 
the three representational format factors, with the proviso that the group sizes 
were made equal. 

2.2 Materials and Procedure 

The experimental session for each participant lasted no more than an hour 
and consisted of three phases. The first involved familiarization with the 
general nature of the task and the production rules in particular, as shown in 
Fig. 4. The second phase included training on each of the productions using 
mini-problems that comprised a small number of components. For each 
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production there was a worked example and two problems that were 
completed by the participant, with assistant from the experimenter if 
necessary. The training examples were printed on sheets of paper. In order 
to make the training materials easily comprehensible the examples were 
presented using written descriptions and a diagram. Table participants were 
additionally shown tables for the mini-problems. However, the participants 
were only permitted to write on the particular representation that they would 
be using in the test phase. None of the participant had any difficulty learning 
how the productions worked. 

In the main test phase the three problems of varying complexity were 
presented on printed sheets to the participants in a random order. The 
stimuli had only the one representation and the participants were instructed 
to use only that representation to solve the problem. The solution times were 
recorded. 

3 Results 

All the participants completed all the problems. Fig. 6 shows the median 
times for problem completion for each condition, with bars for quartiles giving 
information about the distributions of values. As can be seen, data for some 
conditions are highly skewed and in opposite directions. Thus, non- 
parametric statistics have been chosen for the analysis of the data and a 
conservative level of significance set, a=.01 (all tests are p<.01, unless 
otherwise stated). The reality of the impression given by Fig. 6 that for each 
level of complexity the significant order of difficulty of solution is sentence- 
table-diagram is supported by Jonckheere tests for order alternatives (3 
conditions, 12 participants per group: low complexity, S=304; medium, 
S=248; high, S=372). Similarly, for the diagram and sentential 

representations the expected order of increasing difficult with greater 
complexity is present, as confirmed by Page tests for ordered alternatives 
(diagram, L=164; sentence, L=162.5). However, the increase in solutions 
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times for the tabular representation is only marginally significant (L=155.5, 
p<.05). 




Fig. 6. Median solution times for the three representation types by the three levels of 
complexity. The bars on for each point show first and third quartiles. For clarity the 
bars for the table data points are thicker lines 

As both the Jonckheere and Page tests merely determine whether at least 
one of the medians is greater than the next in order, and to investigate the 
possibility of an interaction of the representation and complexity factors, 
consider comparisons of the solution times between pairs of conditions. 
Tables 1 and 2 summarize pair wise Wilcoxon and Mann Whitney tests on 
combinations of complexity and representations conditions. For the table 
representation there are no significant differences between adjacent levels of 
complexities (at <x=.01). For the sentence representation there is a significant 
difference in difficulty between the medium and high complexities only. The 
reverse is true of the diagram representation, with a significant difference 
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occuring between the low and medium complexities. This is suggestive of an 
interaction between complexity and representation. 



Table 1. Summary of Wilcoxon tests on pairs of levels of complexity under each of the 
representations 



Comparison 


Sentence 


Table 


Diagram 


Low-medium 


T=28.0, n.s. 


T=23.0, n.s. 


T=0.0, pc.01 


Medium-high 


T=0.0, p<.01 


T=13.0, 


T=22.0, n.s. 






(p<. 05) 





Table 2. Summary of Mann- Whitney tests on pairs of representations for each level of 
complexity 



Comparison 


Low 


Medium 


High 


Sentence-table 


U=63.5, n.s. 


U=64, n.s. 


U=14.0, pc.01 


Table-diagram 


U=0.0, pc. 01 


U=13.0, pc. 01 


U=15.5, pc.01 



Table 2 shows that there are significant differences between the solution 
times of the diagrams users and the table users at all three levels of problem 
complexity. The difference between table users and the sentence users is only 
significant at the high complexity level. This implies that the tables are more 
similar to the sentential representations than they are to the diagrams. 
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Fig. 7. Comparison of relative difficult of solutions with different pairs of 
representation across the level of complexity 

Fig. 7. shows the ratios of the solutions times for each pair of 
representations for the three problem complexity levels, which illustrates the 
relative proportional difficulty of solution. The sentential representation is 
between four times and up to six times harder, approximately, than the 
diagrammatic representation. The tables are between three and five times 
harder than the diagrams. The greatest advantage of the diagram 

representation over the others is on the problem with the lowest complexity 
and not as predicted on the problem with the greatest complexity. The 
relative advantage of the tables over the sentential representations is no more 
than a factor of two. 

4 Discussion 

In the classic studies on isomorphs of the Tower of Hanoi the problem 
difficulty varied by up to 16 times. Unlike those studies, the different 
representations in this experiment varied only in the format of the 
representation and not in the set of rules used by the participants. 
Nevertheless, solving the pulley problem with the tabular and sentential 
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representations was, respectively, up to five or six times that of using the 
diagrammatic representation. Just changing the format of a representation 
produces a substantial effect, which is something that needs to be explained. 

Whereas the overall pattern of problem difficulties was as expected, the 
detailed comparisons of performance challenges the predictions derived from 
considerations of locational indexing. On this basis, it was expected that the 
table representations would be superior to the sentential representation and 
comparable to the diagrammatic representation, because the structure of the 
table provides a strong spatial indexing of the information through the use of 
rows and columns, even though the Euclidean distance between related items 
is not necessarily small. The difference between the table and diagram cannot 
simply be explained by a failure of the table participants to use the indexing 
system of the table, because the success rate was high and finding information 
in tables is a well-practiced skill for university undergraduates. Thus, it is 
inferred that the effect of locational indexing on the search for items of 
information does not provide a full account of the benefit of diagrams. 

Similarly, the prediction that the benefits of locational indexing for 
diagrams would be greater the more complex the problem was not supported. 
There is about an extra one-fold decrease in the proportional difficulty of the 
sentential representation compare to the diagram with each increase in level of 
complexity. The complexity of the problems varied most substantially in 
terms of the number of types of rules that were needed to solve the problems. 
The easier recognition of rules due to location indexing does not seem to have 
been a major factor on the relative problem difficult between representations. 

How can this state of affairs be explained? A possible account for the 
failure of both predictions is to consider the specific nature of the 
representations and the problem solving strategies that are being used with 
each of them. A clue comes from the lack of a significant increase in the 
solution time with the table representation with increasing problem 
complexity, although the medians do have an increasing trend. This implies 
that the table participants are using an approach that is relatively 
independent of the complexity of the problems. Examining the tables for the 
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three problems, it is oblivious that they are all more similar to each other 
than the diagrams are to each other (which is why only one table is 
reproduced here). The minimal solution path for the medium complexity 
table is show by the arrows in Fig. 2. Note that there are alternative paths 
branching off the minimal path that do not contribute to the required answer. 
It seems that the participants may have been progressing thorough the table 
completing all the empty cells without regard for the minimal path. 
Inspection of the solution sheets used by the participants confirms this was 
the case. In 34 out of the 36 solutions produced all the cells were completed. 
In the two cases where some cells were not completed the participants 
appeared occasionally to be following the minimal path, but in so doing they 
will have needed to have searched for the dead end branches and reasoned 
about avoiding them. Thus, it is clear that all the table users executed 
exhaustive searches and nearly all completed all the information in the tables. 
This would have added substantially to the computation time needed for 
solution, even though information required for any particular inference would 
have been easy to locate using the indexing of rows or columns. 

In contrast the diagram participants seem to have been able to follow the 
minimal path through their representation. Examining Fig. 1, and 5a and 5b, 
the overall pattern connectivity of ropes and pulleys throughout the system is 
easily comprehended, so it is possible to readily spot paths that do not 
progress towards the unknown weight. Hence, a more efficient strategy that 
follows the minimal path could be executed, unlike the tabular representation. 
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R4-sopportsanother weight, W2 >\/ 

On the other side of the original pulley ^Pl, the rope Re hanges down 
to the left side of another pulley^B3( 

On the ni-hpr cjfkwifThP rope Rf is attache to the celing to 

support /x' / 

From the centre ofsPSis attac hed anothe ufepfr Rg 



Fig. 8. Complex problem in the sentential representation. Lines show co-references to 
the same components of the pulley system 

The ease of comprehension of the structure of the pulley system also 



diagram representation was greatest with the simplest problem, rather than 
the most complex, contrary to what was predicted by a straightforward 
locational indexing explanation. In addition to requiring only two rules to 
solve, the simple problem is symmetric and a single string connects the given 
and unknown weights. Both of these features are obvious in Fig. 5. a but are 
not apparent in either the tabular or sentential representation for the same 
problem. Spotting that the problem is symmetrical, a diagram participant 
may simply have inferred that the given and unknown would be identical. 
Spotting that the same string directly connects both weights, a participant 
may simply infer that the constant tension means the weights will be equal. 
These short cut inferences are not possible with the diagrams for the more 
complex problems. Hence, the greatest relative benefit of the diagram, about 
six times faster than the sentential representation and five times faster than 
the table, was on the simplest problem. 

Locational indexing does provide an explanation of the jump in difficult of 
the complex problem in the sentential representation compared to the other 
problems under the same representation. The lines in Figs. 3 and 8 show the 
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distribution of labels for the components that are involved in the minimal 
solution to the medium and complex problems, respectively. Clearly, search 
for information in the simple problem will be easier than in the complex one, 
because items of information about individual components in the simple 
problem are separated into different regions of the representation, whereas 
they are mingled amongst each other and more widely dispersed in the 
complex representation. As expected this has a more than proportional effect 
on the difficult of the complex problem. 

Locational indexing does provide partial explanation of why some types of 
representation are better than others and the impact this has on solutions to 
problems with different complexities. However, to adequately account for the 
pattern of results obtained in this experiment it was necessary to consider the 
strategies that can be used with each representation in general and the 
specific characteristics of representations of particular problems. The nature 
of the representations impacts on how their users solve the problems, with the 
diagram supporting the selection of information pertaining to the most direct 
path between initial state and goal, whereas the table encourages a less direct 
exhaustive inferences over all of the information, because the overall structure 
of the system and problem is hidden, so apparently preventing quick look 
ahead and planning. The opacity of the table, in this respect, is similar to 
that of the sentential representation. The similarity of performance on the 
tabular and sentential representations, plus the contrast with the 
diagrammatic representation, suggests the benefit to problem solving of a 
representation that reveals the overall structure of a problem may be at least 
as important as the use of locational index of information. 

This conclusion is consistent with previous claims about representational 
systems and problem solving. Koedinger and Anderson [5] have shown 
empirically and computationally the importance of diagrammatic 
configuration schemas (DCS) as memory structures that expert geometry 
problem solvers use to encode their knowledge. A DCS has a diagram (e.g., 
weight suspended from two ropes) around which information is stored about 
the relations (e.g., weight is sum of the rope tensions) and necessary 
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conditions for inferences to be made using the rules (e.g., both tensions given). 
By visually matching DCS diagrams with parts of the problem diagram, valid 
inferences can quickly be identified, which supports planning and a working 
forward strategy. Koedinger [4] suggests that DCSs may have a similar role 
in diagrams for pulley system problems. The outcome of the present 
experiment supports this claim and it is noteworthy as participants were not 
experts in mechanics or the domain, although they did receive training. It is 
possible that the table participants had also acquired and used something 
equivalent to DCSs but for cells configurations in the tables. However, the 
potential benefit that table users could gain from table configuration schemas 
is likely to be lower than DCSs, because patterns of cells in a table are less 
easy to discriminate than images of pulleys, ropes and weights. 

Finally, there are implications for the design of effective representational 
systems to support problem solving and learning. First, although Larkin and 
Simon's [8] explanation of the value of diagrams must be taken as part of a 
larger and more complex account, the design of a good representation should 
use location indexing as a means to coordinate information that will be useful 
to problem solving. It is unlikely that a representation without such a scheme 
will be effective. Second, the design of the representation should attempt to 
make the underlying structure of the problem readily apparent to the user. 
Such a representation is likely to support planning and rule selection well. 
This is consistent with Cheng's [2] claims about the nature of effective 
representational systems. Specifically, an effective representation should 
attempt to encode the underlying relations, or meaning, of a domain directly 
in the structure of its representational schemes. Representations, having such 
semantic transparency, have been shown to substantially improve for 
conceptual learning in a number of knowledge rich topics in science and 
mathematics [2]. 
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Abstract. The rapidly increasing availability of electronic publications 
containing information graphics poses some interesting challenges in 
terms of information access. For example, visually impaired individu- 
als should ideally be provided with access to the knowledge that would 
be gleaned from viewing the information graphic. Similarly, digital li- 
braries must take into account the content of information graphics when 
constructing indices. This paper outlines our approach to recognizing the 
intended message of an information graphic, focusing on the concept of 
perceptual task effort, its role in the inference process, our rules for es- 
timating effort, and the results of an eye tracking experiment conducted 
in order to evaluate and modify those rules. 



1 Introduction 

Information graphics (line graphs, bar charts, etc.) are pervasive in popular me- 
dia such as newspaper and magazine articles. The rapidly increasing availability 
of electronic publications poses some interesting challenges in terms of informa- 
tion access. For example, individuals with impaired eyesight have limited access 
to graphical displays, thus preventing them from fully utilizing available informa- 
tion resources. Information graphics also provide a challenge when attempting 
to search the content of mixed-media publications within digital libraries. 

Our research involves recognizing the graphic designer’s communicative in- 
tention for a particular information graphic. Our analysis of a corpus of infor- 
mation graphics from popular media sources indicates that information graphics 
generally have a communicative goal and that this intended message is often not 
conveyed by accompanying text. Thus recognizing the intended message of an 
information graphic is crucial for full comprehension of a mixed-media resource. 
Our project’s overall goal is two-fold: 1) to provide alternative access to informa- 
tion graphics for visually impaired users and 2) to provide access to publications 
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in digital libraries via the content of information graphics. For visually impaired 
users, we are designing an interactive natural language system that provides an 
initial summary that includes the information graphic’s intended message along 
with notable features of the graphic, and then responds to follow-up questions 
from the user [1], For digital libraries, the initial summary of the graphic will be 
used in conjunction with the document text to provide a more complete repre- 
sentation of the content of the document to be used for searching and indexing. 

Although some projects have attempted to make images accessible to visu- 
ally impaired viewers by reproducing the image in an alternative medium, such 
as soundscapes [14], these approaches are ineffective with complex information 
graphics; moreover, they require the user to develop a “mental map” of the in- 
formation graphic, which puts congenitally blind users at a disadvantage since 
they do not have the personal knowledge to assist them in the interpretation of 
the image [8], The underlying hypothesis of our work is that alternative access 
to what the graphic looks like is not enough — - the user should be provided with 
the message and knowledge that one would gain from viewing the graphic in 
order to enable effective and efficient use of this information resource. 

This paper first outlines our overall approach to inferring the communicative 
message of an information graphic as well as various types of evidence (caption, 
highlighting, and a user model) that can aid the inference process. It then focuses 
on one specific type of evidence, perceptual task effort, discusses its role in 
recognizing the graphic’s intended message, describes our rules for estimating 
perceptual task effort, and presents the results of an eye tracking experiment 
conducted in order to evaluate and revise our effort estimates. 

2 Recognizing the Graphic Designer’s Intended Message 

As Clark [3] noted, language is more than just words. It is any “signal” (or lack 
of signal when one is expected), where a signal is a deliberate action that is 
intended to convey a message. Language research has posited that a speaker or 
writer executes a speech act whose intended meaning he expects the listener to be 
able to deduce, and that the listener identifies the intended meaning by reasoning 
about the observed signals and the mutual beliefs of author and interpreter [6,3]. 
Applying Clark’s view of language to information graphics, it is reasonable to 
presume that the author of an information graphic similarly expects the viewer 
to deduce from the graphic the message that he intended to convey by reasoning 
about the graphic itself, the salience of entities in the graphic, and mutual beliefs. 

Beginning with the seminal work of Allen [16] who developed a system for 
deducing the intended meaning of an indirect speech act, researchers have applied 
plan inference techniques to a variety of problems associated with understanding 
utterances, particularly utterances that are part of a dialogue. Given domain 
knowledge in the form of operators that decompose goals into a sequence of 
subgoals, along with evidence in the form of an observed action (such as an 
utterance), a plan inference system chains backwards on the plan operators to 
deduce one or more high-level goals that might have led the agent to perform the 
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observed action as part of an overall plan for achieving his goal(s). The high-level 
communicative goals in the plan capture the utterance’s intended meaning. 

In their work on intelligent multimedia generation, the AutoBrief group pro- 
posed that speech act theory can be extended to the generation of graphical 
presentations^]. When designing an information graphic, the designer has one 
or more high-level communicative goals. Consequently, he constructs an informa- 
tion graphic that he believes will enable the viewer to perform certain perceptual 
and cognitive tasks which, along with other knowledge, will enable the viewer to 
recognize the intended message of the graphic [9]. By perceptual tasks we mean 
tasks that can be performed by simply viewing the graphic, such as finding the 
top of a bar in a bar chart; by cognitive tasks we mean tasks that are done via 
mental computations, such as computing the difference between two numbers. 

In our research, we extend plan inference techniques (that have been used 
successfully on natural language discourse) to inferring intention from infor- 
mation graphics. Our plan operators capture knowledge about how the graphic 
designer’s goal of conveying a message can be achieved via the viewer performing 
certain perceptual and cognitive tasks, as well as knowledge about how percep- 
tual and cognitive tasks decompose into sets of simpler tasks. Using these plan 
operators, we can chain from evidence provided by the information graphic to 
eventually reach a high-level goal that captures the underlying message of the 
graphic in the same way that plan inference systems chain from a speech act to 
the probable goals of an utterance, Input to our plan recognition system consists 
of an XML representation of the graphic as provided by a vision module [1] . 

In extending plan inference techniques to the recognition of intentions from 
information graphics, we need to identify the types of evidence that will be used 
in the plan inference process. In plan recognition systems involving dialogue, 
the evidence is naturally centered around the utterances, and the inference pro- 
cess proceeds incrementally as the dialogue unfolds, using evidence such as the 
surface form of the utterance, the focus of attention in the dialogue, etc. When 
dealing with information graphics, the viewer is presented with the entire in- 
formation graphic, and a decision needs to be made as to which aspects of the 
graphic should be used as evidence of the graphic designer’s intentions. Follow- 
ing AutoBrief [9], we contend that when constructing the graphic, the designer 
made certain design decisions in order to make “important” tasks (the ones that 
the viewer is intended to perform in getting the graphic’s message) as easy or 
as salient as possible. By reasoning about these design decisions, we can glean 
information about the graphic designer’s intended message for the graphic. The 
graphic designer can make a task easy for the viewer to perform by the choice of 
graphic type (for example, bar chart versus line graph [20]) and the organization 
and presentation of data. This observation has led us to include perceptual task 
effort as one of the sources of evidence in our plan inference process; this particu- 
lar type of evidence is the focus of this paper, and is discussed further beginning 
in Section 3. The graphic designer might also intend a task to be particularly 
salient to the viewer. We have identified three sources of evidence which allow us 
to reason about the tasks that the graphic designer intended to be salient for the 
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viewer: captions, highlighted entities in the information graphic, and a model of 
mutual beliefs about entities of interest to members of the viewing audience. 

Well-chosen captions can be useful indicators of the intended message of an 
information graphic. Consider, for example, the graphic on the left in Figure 4. If 
this graphic had the caption “Penny Pinching in 2000,” this would indicate that 
the bar representing 2000 is a particularly salient item in the graphic, whereas 
if the caption read “Capital Expense Peaks in 1999” this would indicate the 
salience of the bar representing 1999 and the task of finding the maximum in the 
graph. Therefore, we use noun phrases in captions as an indication of the salience 
of particular items in the graphic, and verb phrases to indicate the salience of 
particular tasks. One might wonder why we do not deal almost exclusively with 
captions to infer the intentions of the information graphic. Corio[5] performed 
a large corpus study of information graphics and noted that captions often do not 
give any indication of what the information graphic conveys. Our examination 
of a collection of graphics supports his findings. Thus we must be able to infer 
the message underlying a graphic when captions are missing or of little use. 

Graphic designers also use techniques to highlight particular aspects of the 
graphic, thus making them more salient to the viewer. Such techniques include 
the use of color or shading for elements of a graphic, annotations such as an 
asterisk, an arrow pointing to a particular location, or a pie chart with a single 
piece “exploded.” Our working hypothesis is that if the graphic designer goes to 
the effort of employing such attention-getting devices, then the highlighted items 
are almost certainly part of the intended message. Thus we treat the highlighted 
entities as suggesting instantiations of primitive perceptual tasks that produce 
particularly salient tasks. Suppose for example that there was no caption on the 
information graphic shown on the left in Figure 4, but that the bar for 2000 was 
highlighted by shading it darker than the other bars. This suggests that this bar 
is particularly relevant to the intended message of the graphic. Consequently, we 
use the attributes of the bar (such as its label) to instantiate primitive perceptual 
tasks and produce tasks that are hypothesized to be salient. 

A model of the intended recipient of the information graphic also plays a role 
in the plan recognition process. In designing the information graphic, the graphic 
designer takes into account mutual beliefs about entities that will be particu- 
larly salient to his audience. For example, if an information graphic appears in 
a document targeted at residents of Cambridge, then both the designer and the 
viewer will mutually believe that entities such as Cambridge, its sports teams, 
etc. will be particularly salient to the viewer. Our viewer model captures these 
beliefs, and our approach is to treat them in a manner similar to the way in 
which we handle noun phrases in captions. 

3 Estimating Perceptual Task Effort 

Given a set of data, the graphic designer has many alternative ways of designing 
a graphic. As Larkin and Simon note, information graphics that are information- 
ally equivalent (all of the information in one graphic can also be inferred from 
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the other) are not necessarily computationally equivalent (enabling the same 
inferences to be drawn quickly and easily) [12]. Peebles and Cheng further ob- 
serve that even in graphics that are informationally equivalent, seemingly small 
changes in the design of the graphic can affect viewers’ performance of graph 
reading tasks [15], Much of this can be attributed to the fact that design choices 
made while constructing an information graphic will facilitate some perceptual 
tasks more than others. Following the AutoBrief work on generating graphics to 
achieve communicative goals, we hypothesize that the designer chooses a design 
that best facilitates the tasks that are most important to conveying his intended 
message, subject to the constraints imposed by competing tasks [9], 

In order to identify the perceptual tasks that the graphic designer has best 
enabled in the graphic, our methodology is to apply the results of research from 
cognitive psychology to construct rules that estimate the effort required for differ- 
ent perceptual tasks within a given information graphic. Our working hypothesis 
is that the easiest tasks are good candidates for tasks that the viewer was in- 
tended to perform, since the designer went to the effort of making them easy to 
accomplish. We can then use this set of the easiest perceptual tasks along with 
any unusually salient tasks as a starting point for our inference process. By rea- 
soning about the more complex tasks in which these perceptual tasks play a role, 
we can hypothesize the message that the graphic designer intended the viewer 
to extract from the graphic. The component of our system that is responsible 
for estimating effort is called APTE (Analysis of Perceptual Task Effort), 

3.1 Analysis of Perceptual Task Effort 

The goal of APTE is to determine whether a task is easy or hard with respect 
to other perceptual tasks that could be performed on an information graphic. 
In order to estimate , the relative effort involved in performing a task, we adopt 
a GOMS-like approach [2], decomposing each task into a set of component tasks. 
Following other cognitive psychology research, we take the principal measure of 
the effort involved in performing a task to be the amount of time that it takes 
to perform the task, and our effort estimates are based on time estimates for the 
component tasks. In this sense, our work follows that of Lohse [13] in his UCIE 
system, a cognitive model of information graphic perception intended to simulate 
and predict human performance on graphic comprehension tasks. However, we 
are not attempting to develop a predictive model of our own - our aim is to 
identify the tasks that the designer would expect to have best facilitated by his 
design choices in order to utilize that information in the plan inference process. 



Structure of Rules. APTE contains a set of rules that estimate how well a task 
is enabled in an information graphic. Each rule captures a perceptual task that 
can be performed on a particular type of information graphic (line graph, bar 
chart, etc.), along with the conditions (design choices) that affect the difficulty 
of performing that task. The conditions for the tasks are ordered so that the 
conditions producing the lowest estimates of effort appear first. Often several 
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Rule-l:Estimate effort for task 

Perceive-dependent-value(<viewer>, <g>, <att>, <e>, <v>) 
Graphic-type: bar-chart 

Gloss: Compute effort for finding the exact value <v> for attribute <att> 
represented by top <e> of a bar <b> in graph <g> 

Bl-1: IF the top <e> of bar <b> is annotated with a value, 

THEN effort=150 + 300 

Bl-2: IF the top <e> of bar <b> aligns with a labelled tick mark on 
the dependent axis, THEN effort=scan + 150 + 300 



Fig. 2. A rule for estimating effort for the perceptual task Perceive-value 



conditions within a single rule will be satisfied - this might occur, for example, 
in the rule shown in Figure 2 which estimates the effort of determining the exact 
value represented by the top of a bar in a bar chart. Condition-computation pair 
Bl-1 estimates the effort involved when the bar is annotated with the value; this 
condition is illustrated by the second and fourth bars in Figure 1. The second 
condition-computation pair, Bl-2, is applicable when the top of the bar aligns 
with a labelled tick mark on the dependent axis; this condition is illustrated by 
all bars except the second bar in Figure 1. If the top of a bar both falls on a tick 
mark and has its value annotated at the top of the bar (as in the fourth bar 
in Figure 1), the easiest way to get the value represented by the top of the bar 
would be to read the annotated value, although it could also be obtained by 
scanning across to the tick mark. When multiple conditions are applicable, the 
first condition that is satisfied will be applied to calculate the effort estimate, 
thereby estimating the least expected effort required to perform the task. 
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Developing Effort Estimates. Researchers have examined many different 
perceptual tasks, although often studying individual perceptual tasks in isola- 
tion. As mentioned earlier, we have followed Lohse’s approach [13] in breaking 
down our tasks into component tasks. We then utilize existing time estimates 
(primarily those applied in Lohse’s UCIE system) for the component tasks wher- 
ever possible. For some perceptual tasks, this has been a sufficient foundation 
for our rules. For example, we developed effort estimates for the rule shown in 
Figure 2 in this manner. In the case of condition-computation pair Bl-1 (finding 
the exact value for a bar where the bar is annotated with the value), the effort is 
estimated as 150 units for discriminating the label (based on work by Lohse [13]) 
and 300 units for recognizing a 6- letter word [7], In the case of Bl-2 (finding the 
exact value for a bar where the top of the bar is aligned with a tick mark on the 
axis), the effort estimate includes scanning over to the dependent axis (measured 
in terms of distance in order to estimate the degrees of visual arc scanned [11]) 
in addition to the effort of discriminating and recognizing the label. 

For more complex tasks that have not been explicitly studied by cognitive 
psychologists, we have applied existing principles and laws in the development 
of our rules for estimating perceptual effort. An example of this is the class of 
comparison tasks (for example, comparing the tops of two bars to determine the 
relative difference in value), where the proximity compatibility principle defined 
by Wickens and Carswell [19] plays a major role. This principle is based on 
two types of proximity; perceptual proximity refers to how perceptually similar 
two elements of a display are (in terms of spatial closeness, similar annotations, 
color, shape, etc.) while processing proximity refers to how closely linked the two 
elements are in terms of completing a particular task. If the elements must be 
used together (integrated) in order to complete a task, they have close processing 
proximity. The proximity compatibility principle states that if there is close 
processing proximity between two elements, then close perceptual proximity is 
advised. If two elements are intended to be processed independently, then distant 
perceptual proximity is advised. Violating the principle will increase the effort 
required for a viewer to process the information contained in the display. 

We assume that the graphic designer attempted to follow the proximity com- 
patibility principle in designing the information graphic so as to facilitate in- 
tended tasks and make them easier to perform than if the principle were violated. 
This assumption is reflected in the rule in Figure 3, where the effort required 
to perform the integrated task of determining the relative difference between 
two bars is different based on the bars’ spatial proximity. For adjacent bars, the 
effort required will generally be lower than if the bars were not adjacent. 

Weber’s Law [4] has also played a critical role in our rules. Many of the tasks 
for which we have had to develop effort estimates involve discriminating between 
two or more graphical elements; these tasks require the viewer to make compar- 
ative judgments of length, area, and angle. In order to define the conditions 
affecting the complexity of these judgments, we have applied Weber’s Law [4]. 
One of the implications of Weber’s Law is that a fixed percentage increase in line 
length or area is required to enable discrimination between two entities (and the 
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Rule-3: Estimate effort for task 

Perceive-relative-diff(<viewer>, <g>, <el>, <e2>, <bl>, <b2>, <r>, <d>) 
Graphic-type: bar-chart 

Gloss: Compute effort for finding the relative difference <r> in value (greater than 
/less than/equal to) and degree <d> of difference (high/low/none) represented 
by the tops <el> and <e2> of two bars <bl> and <b2> in graph <g> 

B3-1: IF bar <bl> and bar <b2> are adjacent and the height difference 
is >10% THEN effort=92 + 230 + 150 
B3-2: IF bar <bl> and bar <b2> are not adjacent and the height 
difference is >10% THEN effort=92 + 460 + 150 
B3-3: IF bar <bl> and bar <b2> have height difference >5% 

THEN effort=92 + 920 + 150 



Fig. 3. A rule for estimating effort for perceptual task Perceive-relative-diff 



probability of discrimination is affected not by object size, but by the percentage 
increase). Weber’s Law has influenced the thresholds used in rules for estimating 
effort such as Rule-3 in Figure 3 where thresholds in the percentage difference in 
the height of the bars influence the effort required to perceptually discriminate 
the relative difference between the values represented by the bars. 

In some cases, the optimal combination of component tasks does not take 
into account the escalating complexity represented by the conditions of the rule. 
For example, our eye-tracking experiments showed that viewers performed an 
average of four saccades if the bars to be compared differ in height by 5% to 
10% and an average of two saccades if the non-adjacent bars’ height difference 
was greater than 10%. In both cases, one saccade (from the top of the lowest 
bar to the top of the highest bar) would be optimal in the sense of providing 
the necessary information. Our rules capture the expected number of saccades 
required by the average viewer in order to perform the necessary perceptual 
judgment. The effort estimates in Figure 3 show the estimate of 92 units to 
perform a perceptual judgment [18] along with a multiple of 230 units where 230 
represents the estimate for a saccade [17] and the effort of discriminating the 
top of the higher bar (150 units based on [13]). The conditions and estimates in 
Rule- 3 (Figure 3) reflect the results of our experiment (described in Section 4). 
The eye-tracking data guided the development of the thresholds and showed that 
when the height difference was small (between 5% and 10%), the bars’ adjacency 
did not have a discernable effect on the number of required saccades. 

3.2 Applying Effort Estimates 

After applying the APTE rules to identify the set of the easiest perceptual 
tasks for a given information graphic, and then reasoning about the more com- 
plex tasks in which the these perceptual tasks play a role, we can hypothesize 
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the message that the graphic designer intended the viewer to extract from the 
graphic. Consider, for example, the graphic shown in Figure 1. Because the 
graphic designer has chosen to annotate the two bars representing 1970 and 
1990 with their exact values, the task of perceiving the exact value for these 
two bars will appear in the set of the easiest perceptual tasks for this graphic. 
We then infer that higher-level (more complex) tasks that include these tasks as 
subgoals are good candidates to represent the possible communicative intention 
of the graphic designer. A task that would be considered a good candidate in 
this example is comparing the values represented by the two bars, since this task 
not only includes two of the easiest perceptual tasks, but the instantiation of the 
parameters in this task are appropriate according to the proximity compatibility 
principle since the two bars with close processing proximity (the two bars be- 
ing compared) have close perceptual proximity (are both annotated with their 
values). Other lesser candidates would include finding the relative difference in 
capital expense in 1970 versus 1980, 1980 versus 1990, and 1990 versus 2000. 
These candidates would be supported by the fact that the tasks of finding the 
relative difference between these pairs of bars are in the set of easiest percep- 
tual tasks for the graphic, since the pairs of bars are adjacent and have large 
percentage differences in height. If other evidence, such as a helpful caption, 
highlighting techniques or a relevant user model are available, this evidence will 
also be taken into account in the inference process. 

4 Evaluating and Modifying APTE 

This section describes an eye tracking experiment that was conducted to evalu- 
ate the APTE rules for bar charts and to suggest revisions to these rules. A set 
of rules describing tasks in which we were interested was developed based on 
the cognitive principles described in Section 3.1. Information graphics were then 
designed to test the various conditions of the tasks. The results from the ex- 
periment were used both to verify that the cognitive principles that guided the 
development of the rule set were appropriately applied and to suggest modifica- 
tions to individual conditions of the rules within the rule set. 



4.1 Method 

Eleven participants 1 were asked to perform various tasks using vertical (column) 
bar charts shown to them on a computer monitor while their eye fixations were 
recorded. Each task was completed by seven of the participants. Examples of the 
tasks include finding the bar representing the maximum value in the bar chart 
and finding the exact value represented by the top of a particular bar in a bar 
chart. For each task, participants were shown a screen with the instructions for 
the task displayed. The instructions for each task included some specific action to 
be taken by the participants to indicate the results of the task. These actions fell 

1 A twelfth participant could not be calibrated on the eye tracker. 
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into two categories; in the first category, the result of the task was indicated by 
the participants clicking on an element of the information graphic, while results 
of tasks in the second category were indicated by the participants clicking on one 
of three buttons shown below the information graphic. Both categories of tasks 
included a mouse movement and a mouse click, and the time of the start of the 
mouse movement and the time and location of the mouse click were recorded 
as part of the data collected during the experiment. When the participants had 
read and felt that they understood the instructions, they clicked the mouse. The 
next screen that the participants were shown contained only a fixation point. 
After clicking on the fixation point, the participants were shown the bar chart 
on which they were to perform the prescribed task. The participants moved to 
the next task by clicking on a “Done” button shown at the bottom right corner 
of the screen. 



4.2 Design 

The experiment was designed to obtain the average time required to complete 
a given task across participants. Six bar charts were constructed that displayed 
a variety of different characteristics (increasing versus decreasing trends, varying 
numbers of bars, sorted versus unsorted labels, bars sorted by height or unsorted, 
etc.). We call these six bar charts the “base” bar charts, since the actual bar 
charts used in the experiment were variants of these. The APTE rule set for bar 
charts currently contains ten rules describing various perceptual tasks that can 
be performed using a bar chart. For a given base bar chart, only a subset of 
the rules and their conditions could be analyzed. For example, there are three 
rules describing trends - one for increasing trends, one for decreasing trends 
and one for stable trends. In this experiment, each base bar chart contained 
only one trend, so at least two of the rules would not apply to a given base bar 
chart. However, other rules might have multiple conditions that could all apply 
to a given base bar chart (for example, applying the rule shown in Figure 3 to 
different pairs of bars) . The set of tasks being evaluated varied between base bar 
charts, but always included at least one condition of each applicable rule. 

In order to prevent participants from becoming familiar with the six base 
bar charts being analyzed, the actual test graphics were variants of the base bar 
charts. In designing the test graphics, characteristics of the base bar chart that 
were extraneous to the task being evaluated were altered. For example, if the 
task was to locate and read the label of a given bar in the base bar chart, the 
attribute name displayed on the y-axis and the heights of the bars not involved 
in the task would be altered in the test graphic (see Figure 4). The order in 
which the participants completed the tasks was also varied so as to avoid effects 
of familiarity with the content of the specific information graphics and expertise 
obtained through practice in performing the requested tasks. 
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Fig. 4. Base bar chart example (left) and test graphic for get-label (right) 



4.3 Procedure 

Each trial began with a series of five practice tasks. Participants were informed 
that the first five tasks were for practice, and were allowed to ask any questions 
about the format of the experiment during the warmup period. Participants were 
given several points of instruction. For tasks that required participants to choose 
an answer shown on one of three response buttons, participants were instructed 
to look only at the graphic in completing the task, and to determine the answer 
to the task before reading the labels on the buttons, For all tasks, participants 
were asked to only move the mouse when they were ready to make a response. 
After the fifth practice graphic, the participants were presented with the series of 
tasks comprising the experiment. The participants’ eye fixations were measured 
using an Iscan Model RK-716PCI eye tracking processor operating at 60Hz. 

4.4 Data Analysis 

The aim of this experiment was to obtain the average completion time of all 
participants for a given task, and to compare the rank order of those average 
completion times to the rank order of estimates produced by the APTE rules for 
those same tasks, The completion time for each set of data was determined based 
on a combination of the time of the initial mouse movement and the pattern of 
the participant’s eye fixations. In order to obtain the best possible measure of 
the completion time of the task, it was determined that a combination of the 
initial mouse movement and the pattern of the participants’ eye fixations would 
be used. For tasks that required the user to click on a button, task completion 
time was recorded as the beginning of the mouse movement if that movement 
was just prior to the participant moving his gaze to the region of the screen where 
the buttons were located. If the mouse movement did not coincide with the shift 
in gaze away from the graphic and towards the buttons, the task completion 
time was recorded as the end time of the final fixation within the information 
graphic. For tasks that required the participant to click on an element of the 
information graphic, task completion time was recorded as the beginning of the 
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mouse movement if that movement took place during the same fixation in which 
the participant “clicked” on the appropriate element. If the mouse movement 
did not coincide with this eye fixation, the completion time of the task was 
recorded as the beginning of the fixation during which 1 the participant selected 
the appropriate graphical element. 

Recall (Section 4.2) that variants of each base bar chart were constructed to 
avoid participants becoming familiar with particulars of the bar chart. The base 
bar charts were divided into two sets. For each set, the participants evaluated 
all of the tasks related to the base bar chart using the same variants. For each 
of the six base bar charts, the list of average task completion times 2 (over all 
participants) was sorted. These sorted task lists were compared to the sorted 
lists of the effort estimates produced by the APTE rules. The emphasis in this 
analysis was on the relative rank of each task within the sorted lists, rather than 
on the actual values. Any discrepancies in the task ordering for an individual 
graph were noted. Of particular interest were any perceptual tasks where the 
rankings did not correlate across different base bar charts. The eye fixations for 
these tasks were then analyzed in closer detail in order to detect patterns in the 
fixations that would support a change in the APTE rule for the task. Several 
APTE rules were modified based on the data gathered in the experiment, and 
these changes, along with an overall analysis of the data are discussed in the 
next section. 

4.5 Results and Discussion 

The comparison of the rankings of average completion times and the correspond- 
ing effort estimates for each of the six base bar charts showed strong support for 
the cognitive principles on which the APTE rules are based. For example, the 
hypothesis that it would require less effort to compare bars that are adjacent 
(based on the proximity compatibility principle [19]) was upheld, as was the ap- 
plication of Weber’s Law [4] in developing rule conditions based on thresholds in 
the percentage of height difference of the bars. However, as intended, the results 
of the experiment also provided evidence of ways in which the APTE rules could 
be modified in order to improve the quality of the effort estimates produced. 

One area in which we applied the results of the experiment was in the effort 
estimates for perceiving trends in bar charts. Our initial APTE rules for trends 
represented the perception of a trend in terms of simple scans of the bar chart, 
while the eye fixation data showed a less smooth, slower processing of the data. 
By altering our APTE trend rules to represent the pairwise perceptual judgments 
supported by the eye fixation data, we were able to more accurately assess the 
effort required for trend recognition. 

The experiment also yielded some unexpected insights into the way in which 
participants process information graphics. For example, our initial APTE rule 

2 Data was excluded from the results if the participant’s gaze left the information 
graphic to view the labels on the buttons before completing the task, if the partic- 
ipant responded incorrectly to the task, or if it was clear that the participant was 
performing processing not required by the task. 
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for finding the exact value represented by the top of a bar when the top of the bar 
is aligned with a tick mark on the axis (Figure 2) was expected to require far less 
effort to complete than the task of perceiving the data required to interpolate 
the value represented by the top of the bar when the top of the bar does not 
align with a tick mark. However, the patterns of eye fixations of the participants 
showed that when the top of the bar is aligned with a tick mark, participants 
frequently repeat the task (presumably to ensure accuracy). This resulted in 
a change to the rule shown in Figure 2 so that the effort estimate for Bl-2 is 
now equal to 230 + (scan + 150 + 300) x 2. Similarly, in graphics where the 
labels along the primary key axis are unsorted, our initial APTE rule for finding 
the top of a bar given the bar’s label described the viewer as performing a left- 
to-right scan and sampling of the unsorted labels. In analyzing the results of the 
experiment, we found that viewers were far more likely to begin in the middle of 
the axis and search to the left or right, then saccade to the other half of the axis 
if they are unsuccessful in their initial search. Without being able to analyze the 
patterns of participants’ eye fixations, we would have been unable to capture 
this search process and the resultant effort estimate. 

The somewhat surprising results outlined above give rise to some interest- 
ing questions about what should actually be captured by the APTE rules. As 
described previously, our use of the APTE rules and the resulting ranking of 
effort estimates reflects our hypothesis that since the graphic designer has many 
alternative ways of designing a graphic, the designer chooses a design that best 
facilitates the tasks that are most important to conveying his intended mes- 
sage, subject to the constraints imposed by competing tasks [9], Underlying this 
hypothesis is the assumption that the graphic designer is competent, and that 
a competent graphic designer has a fairly accurate model of what is required to 
perceptually facilitate the tasks. We have based this assumption on the wealth 
of resources describing ways in which graphic designers can and should facilitate 
tasks for their viewers ([4] and [10], for example) and the observation that many 
of the techniques described in these resources correspond to the cognitive prin- 
ciples upon which we based our APTE rules. However, it is unclear that graphic 
designers would have an accurate model of some of the less expected results, 
such as the similarity in effort between reading a value from a tick mark and 
gathering the information necessary to interpolate the value. It seems reasonable 
that a graphic designer wishing to facilitate the task of determining the exact 
value would align the top of the bar with a tick mark rather than forcing the 
viewer to interpolate the value. 3 This problem of what to represent in the APTE 
rules accentuates the distinction between the actual effort which viewers expend 
in performing particular tasks versus the difficulty the viewer might reasonably 
be expected to have in performing the task. Viewers tend to repeat the process of 
locating, discriminating and reading the value on a tick mark, but the expected 

3 Of course, following that same argument, it would seem even more reasonable that 
the graphic designer wishing to facilitate the finding of the exact value represented 
by the top of a bar would annotate the top of the bar with the value, and we did find 
that task to require substantially less effort than the other tasks being described. 
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difficulty of that task would not include this repetition. Another example of this 
is the task of finding a particular label amongst the unsorted labels along the 
primary key axis of the bar chart. A graphic designer might reasonably place 
a bar first on the axis, expecting that this will reduce the difficulty of locating the 
bar (expecting the viewer to use a left-to-right search). In practice, it seems that 
viewers begin in the middle of the axis, so the task might be better facilitated 
by placing the bar in the middle of the bar chart. 

However, in order to base the APTE rules on the expected difficulty of the 
tasks rather than the actual effort required to complete them, we would need to 
have evidence of what is contained in the stereotypical graphic designer’s model 
of expected difficulty. Up to this point, we have been using the resources and 
guidelines published for graphic designers along with the assumption of com- 
petence of the graphic designer in order to draw reasonable conclusions about 
the knowledge of the stereotypical graphic designer. Unfortunately, the resources 
that we have examined do not provide guidelines for the tasks in question. Lack- 
ing evidence to support this intuitive distinction between expected difficulty and 
the actual effort expended in performing tasks, we have based our APTE rules 
on the solid evidence regarding the effort expended by participants performing 
these tasks that we have collected during this experiment. An interesting future 
area of research would be to investigate the difference between expected difficulty 
and actual effort and its influence on the choices made by graphic designers. 



Statistical Analysis of Correlation. Having modified the APTE rules to bet- 
ter reflect the patterns of eye fixations demonstrated by viewers, we produced 
a new set of effort estimates based on the modified rules. We then performed a 
statistical analysis to test the correlation between the average completion times 
and the effort estimates produced by our rules. We performed two types of cor- 
relation tests. The Spearman Rank-Order Correlation (rho), used to determine 
whether two sets of rank-ordered data are related, is an especially appropriate 
choice for analyzing this data since we are primarily interested in ranking the 
effort estimates generated by APTE. However, since we use the gaps in the effort 
estimates generated by APTE in order to identify the set of easiest perceptual 
tasks, we also did a Pearson Product-Moment Correlation (r) on the actual 
values. Figure 5 shows the results of both correlations between the average com- 
pletion times and our effort estimates for each of the six base bar charts (values 
approaching 1 show a strong correlation). The p- values are one-tailed and were 
calculated by t-approximation. The results of the Spearman Rank-Order Corre- 
lation show a very high and significant correlation between the ranking of task 
effort provided by the APTE rules and the actual completion times. Interest- 
ingly, the Pearson correlation also shows a very strong correlation between the 
average completion times from the experiment and our APTE effort estimates. 
We intend to run additional experiments to further validate the modified rules 
with new data. 
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Fig. 5. Correlations between average completion time and APTE effort estimates 



5 Conclusion 

In this paper, we have outlined our approach to a novel application of plan 
recognition; recognizing the intended message of information graphics. The abil- 
ity to infer the intended message of a graphic plays a vital role in 1) providing 
alternative access to information graphics for visually impaired viewers, and 2) 
providing access to publications in digital libraries via the content of information 
graphics. Although we utilize multiple types of evidence (caption, highlighting, 
user model) in the intention recognition process, this paper focused on one spe- 
cific type of evidence, perceptual task effort. We discussed the role of perceptual 
task effort in recognizing the designer’s intended message, described our rules 
for estimating effort, and presented the results of an experiment which provides 
strong evidence of the correlation between the effort estimates generated by the 
modified APTE rules and the relative difficulty that humans have in performing 
the corresponding perceptual tasks. In future work, we will expand our APTE 
rules to encompass other graph types, such as line graphs and pie charts. 
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Abstract. People sometimes appear to represent graphical information by 
analogy to space. In this paper we consider the extent to which the tendency to 
represent information by analogy to space calls on spatial resources. We also 
examine whether people who represent graphical information spatially also 
represent numerical infonnation using a spatial number line. Forty-eight adult 
participants carried out a series of graphical reasoning, number judgement and 
spatial working memory tasks. Evidence was found to suggest that people were 
forming spatial representations in both the number judgement and graphical 
reasoning tasks. Performance on the spatial memory task was positively 
associated with a measure of the tendency to use spatial representations on the 
graph task. In addition, measures of the use of spatial representations for the 
graph and number tasks were associated. We interpret our results as providing 
further evidence that people often represent graphical information by analogy to 
space. We conclude with a discussion of whether the use of such spatial 
representations is confined to any one task or is instead a general 
representational strategy employed by people high in spatial ability. 



1 Introduction 

Much everyday cognition appears to involve representations that are spatial in nature. 
For example, people often appear to represent information about relationships 
between objects in the world by analogy to space. Such representations can form the 
basis for subsequent relational inferences [for a review, see 1]. Recently, we have 
applied some of the techniques previously used to examine spatial representations for 
relational inference to demonstrate that people represent graphical information by 
analogy to space [2; 3; 4]. These findings (which we will review in more detail 
below) suggest that accounts of graph comprehension that assume wholly 
propositional representational systems are incomplete [5; 6]. They also support 
accounts of graph comprehension that stress the importance of spatial transformations 
and mental model representations [7; 8]. 

Spatial representations have also been identified in number-cognition. It has been 
suggested that numbers may be represented by analogy to a spatial mental number 
line [9], which (in native English speakers) runs left-to-right with smaller numbers on 
the left and larger numbers on the right. Flowever, there is debate as to whether the 
number line is a core component of the representation of number [10]. 
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Our goal in this paper is to examine individual differences in people's use of spatial 
representations for thinking about graphs. We wish to examine whether people whose 
interaction with graphs suggests that they are using a spatial representational strategy 
tend to be higher in spatial ability than people who do not appear to be using spatial 
representations for graphical reasoning. In addition, we wish to investigate whether 
the tendency to use a spatial reasoning strategy in graphical reasoning is predicted by 
the tendency to adopt a spatial representation of number. The answer to the first of 
these questions has a bearing on how we account for the ability to comprehend 
graphs, whilst an answer to the second question will help us to understand the role 
played by space in abstract cognition. In addition, our findings may have a bearing on 
questions about whether spatial representations can be domain-specific [11] or are 
underpinned by a domain-general resource that may be adapted to particular tasks via 
spatial mappings. 



1.1 Spatial Representations of Graphical Information 

In a series of experiments [2; 4] we have attempted to demonstrate that, at least 
sometimes, people represent graphical information by analogy to space. In these 
experiments we have required participants to carry out an unfamiliar graphical task 
many times in succession. Such a procedure allows for greater experimental control 
and permits a detailed investigation of the mental representation of graphical 
information that is not afforded by other tasks higher in face validity. For example, 
Webber and Feeney presented participants with series of pairs of line graphs or 
premise displays (see Figure 1). Each of these graphs described two referents, one of 
which was common to both graphs. Once participants had examined these graphs they 
vanished from the screen and were replaced by a third graph describing the 
relationship between two of the referents from the previous screen. We will refer to 
this as the conclusion graph. 

Premise Displays j Conclusion Graphs 




A B B C AC 




B A 




Fig. 1. Sample materials from graphical relational reasoning task 
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We manipulated whether the points in the premise displays ascended (see top line 
of Fig. 1) or descended (see bottom line of Fig. 1) left-to-right and the position of the 
repeated referent. In half our trials the non-repeated terms were separated by the 
repeated term ( separated trials - see top half of Fig. 1) whilst in the other half the non- 
repeated terms occurred together. That is, they were not separated by the repeated 
term ( together trials - see bottom half of Fig. 1). We manipulated whether the 
conclusion graph followed logically from the premise displays and whether the order 
of the referents in the conclusion graph was consistent with their order in the premise 
displays. For half of the trials referent order was consistent between premise and 
conclusion displays (see top panel of Fig. 1.) whilst for the other half order was 
inconsistent (see bottom panel of Fig. 1.). 

If participants use spatial representations in order to reason about the relationship 
between the premise and conclusion graphs, then we would expect to see reordering 
effects. We predicted these reordering effects on the grounds that a spatial 
representation of the information in the graphical premises is likely to be (a) 
integrated and (b) order the terms in the premises by size. Problems where the non- 
repeated terms are separated by the repeated term ( separated trials) should not require 
reordering whereas problems where the non-repeated terms occur together ( together 
trials) should require reordering. We expected inspection times - time taken to view 
the premise graphs - to reflect the need for reordering. We found that inspection 
times for together premise graphs were reliably longer than for separated premise 
graphs. In the work that follows, when this difference is positive we will assume that 
people have used a reordering strategy that is diagnostic of spatial representation. 

We also predicted that referent order in the premise graphs would interact with 
consistency between graph and conclusion displays in terms of order. For separate 
trials we predicted that consistent conclusions would be verified more quickly and 
lead to fewer errors than inconsistent conclusions. Flowever, for together trials we 
predicted that reordering would often result in representations in which the order of 
terms was inconsistent with their order on the screen. For example, the order BCAB 
on the screen was predicted to be reordered as ABBC and the order BACB as CBBA. 
Thus, we predicted longer verification times and more errors for consistent 
conclusions than for inconsistent conclusions. We found the predicted effects in both 
error rates and verification times. 

Although we interpret these findings, and others [2], as evidence that people 
represent graphical information by analogy to space, we have no direct evidence that 
these representations are spatial or that they call on spatial resources. The first aim of 
the experiment to be described here was to provide some such evidence. 

1.2 Spatial Strategies for Thinking 

There is substantial evidence in the literature suggesting that people, at least for some 
tasks, use spatial strategies for thinking. For example, the results of studies using a 
secondary task methodology, where people are asked to perform a primary reasoning 
task whilst carrying out a secondary task designed to tap into verbal or spatial 
resources, suggest that spatial strategies are used for tasks with spatial or temporal 
content [12; 13; 14; 15], Much of this work has required people to reason about 
spatial or temporal relationships and is thus highly relevant to our own graphical 
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reasoning task, which requires people to verify relationships between graphical 
referents. 

Another methodology commonly used to study the representations and processing 
resources used in the course of thinking involves individual differences. This 
methodology requires that participants' spatial or verbal ability be measured 
independently of the primary task of interest. The strength of the statistical association 
between performance on an ability measure and performance on the reasoning task is 
taken to reflect the degree to which the resource measured by the ability task is 
involved in the primary task. Individual differences studies have shown that some 
individuals use spatial resources and some use verbal resources when verifying 
pictures against sentences [16] and a similar argument has recently been made about 
syllogistic reasoning [17]. 

In the study to be reported here we hoped to provide evidence for the involvement 
of spatial resources in relational reasoning about premises presented graphically. Such 
a finding would support the claim that people form representations by analogy to 
space when reasoning about graphs [3; 4]. Importantly we predicted that performance 
on an index of the construction of spatial representations for graphical reasoning 
would be positively associated with a measure of spatial resources. We did not predict 
that overall correct solution rates or overall processing times would be associated with 
the use of a spatial strategy. This prediction is motivated by work on reasoning which 
suggests that spatial or verbal strategies may be equally efficient for the 
accomplishment of high-level cognitive tasks [16; 18] 

Our final research question concerned relationships between the tendency to use 
spatial representations for graphical reasoning and number judgement. We predicted 
that people who showed evidence of building spatial representations of the 
information contained in graphs would also display evidence of spatial representation 
of number. Although there is now a substantial literature on the neuroanatomy of 
number representation [see 11] much of the behavioural evidence for the spatial 
representation of number comes from number judgment tasks where a variety of 
effects have been demonstrated. For example, people appear to find it easier to 
compare two numbers the further apart they are in magnitude [19]. In addition, when 
making parity or relative size judgments about numbers, native speakers of languages 
written left-to-right show a reaction time advantage when they are required to respond 
to larger numbers with their right hand and smaller numbers with their left-hand 
compared to when the response code is switched. This latter effect is known as the 
SNARC effect and it is interpreted as evidence for a mental number line [9]. In this 
paper we will test for an association between the SNARC effect and the tendency to 
build spatial representations for graphical reasoning. 

2 Experiment 

2.1 Method 

Participants: 48 participants recruited on the University of Durham's Queen's 
Campus took part in this experiment. 
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Fig. 2. Two simple bar graphs with separate end terms and descending slope 




Fig. 3. Two bar graphs with adjacent end terms and ascending slope 



Materials: Each participant attempted four tasks. One of these - a line bisection task 
- is not relevant to our current purposes and will not be discussed further. The other 
three tasks were a graph comprehension task, a number judgement task and a task 
measuring complex spatial span [20]. Participants completed these tasks individually 
and each task was presented using an IBM clone computer, monitor and keyboard. 
Task order was counterbalanced. We will now describe each task in turn. 

Graph Comprehension Task: The materials for this task comprised 112 trials. Each 
trial consisted of two visual displays, a premise display and a conclusion display, 
which appeared in that order. The premise display contained two simple bar graphs, 
which participants were told always depicted the sales figures for three employees 
over a month The most successful employee was represented by a bar 900mm in 
height, the next most successful by a bar 600mm in height and the least successful by 
a bar 300mm in height. 

We manipulated the position of the non-repeated terms, so that they were either 
together (see Fig. 3. below) or separated by the repeated term (as in Fig. 2. above), 
and the slope of the graphs either ascended left-to-right (Figure 3) or was descending 
(Figure 2). 

The second display contained one simple bar graph, the conclusion graph (see Fig. 
4.), specifying a relationship between two of the employees. For integrated trials this 
graph consisted of two bars representing just the least and most successful salesmen, 
which were exactly the same height as in the premise graphs. Reversing the order of 
the bars and/or reversing the order of the labels of the bars generated four possible 
conclusion graphs. Of these two were logically valid and two invalid, two had labels 
that were consistent with the order in the premises and two had labels that were 
inconsistent with the order. There were a total of 4 premise graphs, each with 4 
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possible conclusions to form 16 conditions. Four sets of materials were constructed 
for each condition by using different employees. This resulted in a total of 64 
integrated trials. 

For reasons of experimental control on certain trials we presented participants with 
non-integrated conclusions. The premise graph displays used for non-integrated trials 
were the same as those in the integrated trials. Flowever, the conclusions differed to 
those presented in integrated trials. Each conclusion was one of the previously shown 
premise graphs and depicted a relationship between either the short non-repeated term 
and the middle term or the tall non-repeated term and the middle term. For these 
trials, we manipulated conclusion validity but not consistency. There were 16 (8 valid 
and 8 invalid) conclusion graphs that contained the tallest non-repeated term (tall 
conclusions) and 16 matched trials containing the shortest non-repeated term (short 
conclusions). To ensure that integrated conclusions did not vastly outnumber non- 
integrated conclusions, we also included a further 16 non-integrated trials whose 
conclusions concerned the shorter non-repeated term. 

For each trial, a fixation cross was presented for 1000ms. This was followed by a 
premise display visible until participants pressed the space bar. Once the space bar 
was pressed a conclusion graph appeared which remained visible until participants 
responded ‘yes' or ‘no' using the keyboard. We recorded the time taken by 
participants to inspect the premise graphs as well as the time they took to respond to 
the conclusion. In addition, we recorded the number of errors made by each 
participant. 

Number Judgement Task: In the number judgement task participants were required 
to indicate whether a target number, presented on a computer screen, was greater or 
less than a referent. The numbers 1-4 and 6-9 were the targets whilst the number 5 
was the referent. Target numbers appeared in the centre of the computer screen one at 
a time and in random order. Each participant attempted two blocks of trials. In the 
first block participants had to press the right button on a button box to indicate that 
the target was bigger than the referent and the left button to indicate that the target 
was smaller than the referent. In the second block of trials a greater than response was 
made with the left hand and a smaller than response was made with the right. 
Participants saw each of the target numbers ten times in each block. Thus, each 
participant made 160 judgements in total. The order of blocks was counterbalanced. 




Fig. 4. A bar graph representing the valid and consistent conclusion that follows from the 
premises depicted in Fig. 1 
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Complex Spatial Span: In the complex spatial span task participants were shown sets 
of capital letters and their mirror images rotated in different orientations. Participants 
had to decide whether each letter was in its canonical or mirror-imaged form whilst 
keeping track of the orientation of each of the letters in the set. At the end of each set 
participants attempted to recall the orientation of each letter in the set in the order that 
they had appeared. 

Trials were computer presented in PowerPoint. Each set was based on one of the 
following letters: F, J, P, L and R. Letters were presented one at a time and appeared 
in one of 7 orientations. The orientations ranged from 45 through to 315-degrees 
(excluding 0-degrees) and increased in 45-degree increments. The presentation of the 
letters was constrained within sets so that no orientation appeared more than once and 
opposing orientations were not presented successively. This resulted in 70 possible 
letter stimuli (letter x orientation x normal/mirror image). 

The task consisted of 25 letter sets, 5 sets at each level (one of each letter). Sets 
increased in difficulty by virtue of the number of letters they contained. There were 
five sets of two letters, five sets of three letters and so on up to 6 letters. Each of the 
70 possible letter stimuli was used at least once in the experiment. As our design 
called for 100 letter stimuli, 30 of the stimuli were repeated. This repeated set of 30 
contained six randomly selected examples of each of the letter types, three of which 
were reversed and three of which were not. For each set participants were required to 
verify out loud whether the letter was normal or mirror-imaged. After they responded 
the next letter appeared on the screen and when they had responded to all the letters in 
a set a recall screen appeared. The recall screen consisted of a ring of 8 black circles 
representing each orientation (including 0-degrees). Participants were required to 
indicate the orientation of each of the letters within a set in the order they appeared by 
pointing to the corresponding black dot on the recall screen. After they had indicated 
the appropriate number of orientations the first letter of the next trial appeared. Trials 
were presented until the participant made three or more errors in recall on a given 
level. 



2.2 Results 

Graph Task: The overall error rate for the graph task was 6.41% (S.D. = 15.82%). 
We removed the data of four participants with an error rate greater than this mean 
plus one standard deviation. Amongst trials where the conclusion presented followed 
logically from the premises, the remaining 44 participants had an error rate of .36%. 
The mean error rate for trials where the conclusion did not follow from the premises 
was 4.19%. In the analyses that follow, we examined reaction times from logically 
valid trials only. 

Inspection Times: We removed the data for all trials where participants had given an 
incorrect response. In addition we removed all trials with a response less than 100 ms 
or more than two standard deviations above the mean for the entire experiment (mean 
= 3349ms, S.D. = 2852ms). These trimming procedures resulted in the removal of 
4.2% of trials. 
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Table 1 . Mean verification times by Slope, End Terns and Consistency. 



Descending Ascending 



End Tenns: 


Separate 


Together 


Separate 


Together 


Consistent 


1738 


1918 


1787 


1765 


Inconsistent 


1960 


1847 


1805 


1792 



We carried out a 2 (Slope: Ascending vs. Descending) x2 (End Terms: Together 
vs. Separate) within participants Anova on inspection times. None of the effects in 
this analysis were significant although the effect of End Terms (F(l, 43) = 1.11, 
MSE = 425295, p = .30), previously found to be significant in experiments on bar and 
line graphs [4], was in the direction predicted. The mean inspection time for premise 
sets where the end terms were apart was shorter than the mean for premise sets where 
the end terms were together (2855ms vs. 2958ms). 

Verification Times: All trials to which a correct response had been given with a 
latency of between 100 ms and the mean (1985ms) plus two standard deviations 
(1226ms) were included in this analysis. This meant that 3.61% of the data was not 
included. A 2 (Consistency) x (Slope) x (End Terms) within participants Anova was 
carried out on this data. The analysis revealed a significant main effect of Slope (F(l, 
43) = 5.78, MSE = 93477, p < .03) and a significant interaction between Consistency 
and End Terms (F(l, 43) = 4.54, MSE = 98026, p < .04). Although this finding 
replicates the figural effect previously found in graphical reasoning [3; 4], it is 
qualified by a marginally significant three-way interaction (F(l, 43 = 3.55, MSE = 
141458, p < .06). Tests for simple interaction effects carried out on the means 

involved in the three-way interaction (see Table 1) revealed a significant 
interaction between Consistency and End Terms for Descending trials (F(l, 43) = 
6.77, MSE = 139849, p < .02) but not for Ascending trials (F(l, 43) < .01). Whilst the 
figural effect is evidence of representation by analogy to space [see 3; 4] it is 
somewhat unclear as to why it should be observed for descending trials only. People 
prefer to build such representations from the top down [21; 22]. This preference may 
have made trials where the premise graphs descend left-to-right particularly amenable 
to solution via the construction and interrogation of a spatial array. Thus, participants 
may have been more likely to build and interrogate spatial representations when the 
premise graphs had a descending slope that made use of such a strategy easy. 
Certainly the verification times in Table 1 suggest that participants used different 
strategies depending on the slope of the premise graphs. 

Number Judgement: The mean error rate for this part of the experiment was 2.15% 
and the highest individual error rate was 12.5%. A 2 (block) x 8 (number) entirely 
within participants Anova was carried out on participants' mean judgement times for 
correct trials only. The means involved in this Anova are presented in Fig. 5. All of 
the effects tested by the ANOVA were significant. Of most interest here is the 
significant main effect of Block (F(l, 47) = 12.66, MSE = 45555, p < .001) and the 
interaction between Block and Number (F(7, 329) = 3.28, MSE = 3748.3, p <.005). 
Planned comparisons confirmed that whereas a large SNARC effect was reliably 
observed for all of the numbers greater than five, for numbers less than five a reliable 
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effect was only observed for the number 1 . This finding may have been caused by our 
decision not to control for handedness. Thus, our predominantly right-handed 
participants may have been faster to respond to numbers less than five with their right 
hand than they are to respond to numbers greater than five with their left. 

This caveat aside, the results of this experiment clearly show a SNARC effect. 
Participants respond faster when they use their left hand to respond to small numbers 
and their right hand to respond to large numbers than when the response code is 
reversed. 

Individual Difference Analyses: The mean complex spatial span of the 44 
participants in our sample whose data was included in the analysis of the graphical 
reasoning task was 3.34 (S.D. = 1.10). 

In order to test hypotheses about the relationship between complex spatial span and 
the tendency to represent graphical and numerical information spatially, we computed 
spatial indices for both the number judgement and graphical reasoning tasks. The 
index for the number judgement task (NSI) was simply the difference between mean 
reaction times in the R-L and L-R conditions. The mean of this index was 60.74 ms 
and its standard deviation was 108.21 ms. Whilst the difference is positive as we 
would expect following our finding of a SNARC effect, people's scores on this index 
vary substantially. We will return to this point in the Discussion. 

We created the spatial index for the graph task (GSI) by subtracting the mean 
inspection times for premise graphs in which the non-repeated appeared separately 
from the inspection times for trials where the non-repeated terms appeared together. A 
positive score on this index indicates that people are spending more time inspecting 
graphs where the end terms appear together. Such graphs require re-ordering if an 
integrated spatial representation is to be constructed. Twenty-six participants had a 
positive score on this index and 18 had a negative score. The mean score was 103.40 
ms (S.D. = 652.15 ms). 
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Fig. 5. Interaction between Number and Block from the SNARC task 

A t-test comparing the mean complex span scores of participants with a positive 
score on the GSI to those with a negative score was significant (t(42) = 2.39, p < .05). 
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The mean span score of participants who showed evidence of reordering was 3.65 
(S.D. = 1.06) whereas the mean for participants who did not reorder was 2.89 (S.D. = 
1.02). Importantly, spatial span was not found to be associated with error rates on the 
graphical reasoning (r = .09, p > .5) task. Nor was it associated with mean overall 
inspection times for the graphical reasoning task (r = -.08, p > .6) or mean overall 
verification times for the graph task (r = -.17, p > .2). Thus, complex spatial span is 
associated with the tendency to use a spatial representational strategy in the course of 
attempting the graph task rather than with people's ability on the task. 

Finally, we observed a moderate correlation between scores on the GSI and NSI 
indices (r = .34, p < .04). Even when we controlled for the effects of complex spatial 
span the correlation remained marginally significant (r = .29, p < .07). This result 
suggests that although there is a positive association between performances on the 
spatial indices from each of these tasks, that association is not due to the demands that 
each task places on the ability to temporarily store and concurrently process 
information in spatial memory. We will return to this issue below. 

3 Discussion 

The results described in this paper support the claim that some people, at least some 
of the time, build spatial representations for reasoning about graphs. This is in 
contrast to the assumptions of several recent models of graph comprehension [5; 6]. 
For graphs with a descending slope only, our data shows evidence of the reordering 
strategy that we argue is diagnostic of the use of spatial representations in the task. In 
addition, the tendency to inspect graphs that require reordering for longer than graphs 
that do not is positively associated with complex spatial span, a measure of spatial 
storage and processing. People who inspect graphs that require reordering for longer 
than graphs that do not, have significantly higher complex spatial span scores than 
people who do not display the inspection time difference. This finding suggests that 
the representational medium used for the construction and manipulation of the 
representations used by some people for graphical reasoning, is spatial in nature. 
Some people appear to represent graphical information by analogy to space. 

The second finding of interest in this paper is our replication of the SNARC effect 
usually interpreted as indicating that one aspect of number representation is position 
along an analogical number line. We have shown that there are individual differences 
in people's tendency to represent numerical information by analogy to space and that 
people who represent number by analogy to space also tend to represent graphical 
information in the same manner. We will return to the significance of this finding for 
claims that have been made about the use of spatial representations for mathematical 
cognition [11]. 

Before considering the implications of our results, it is very important to note that 
the relationship between spatial ability and performance on our graphical task is 
unlikely to be due to a higher-order relationship between general intellectual ability 
and task performance. The GSI that we used as a measure of the tendency to represent 
information spatially was not associated with the number of errors made on the task, 
overall inspection times or overall verification times. Neither was complex spatial 
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span significantly associated with measures on any of these variables. Only the 
tendency to reorder as indexed by the GSI is associated with complex spatial span. 

3.1 Individual Differences in Graph Comprehension 

Although we have observed an association between spatial abilities and the tendency 
to reorder the information contained in graphical premises in a manner consistent with 
the use of spatial representations for reasoning, we have not found associations 
between spatial ability or reordering and overall performance on the task. This result 
suggests that although some people may use spatial representations for thinking about 
graphs, the use of such representations does not lead to better performance. This is in 
contrast with previous work on other diagrammatic tasks [e.g. 23] showing that 
people low in spatial ability make more errors than people high in spatial ability. This 
difference in error rates due to spatial ability increased as the diagrammatic task 
became more complex. An explanation of these results may be given in terms of 
capacity limits [24] where more complex problems are more likely to exceed the 
capacity limitations of people low in spatial ability thus leading to greater differences 
due to spatial ability as the problems get more difficult. 

Such a capacity explanation might be applied to our finding that the association 
between spatial ability and error rates in graph comprehension is not statistically 
significant. Perhaps the graphical reasoning problems that we used were too simple to 
reach the capacity limits of our participants. Hence they did not discriminate between 
people of differing spatial ability and no association was found between error rates 
and ability. 

An alternative account of our results is that the graph task may be accomplished 
using spatial or verbal resources. The GSI is a measure of the extent to which people 
use a particular representational strategy in order to perform the task and scores on the 
GSI allow us to separate out people who use a predominately spatial strategy on the 
task. However, neither a verbal nor a spatial strategy leads to fewer errors or faster 
inspection and verification times. Interestingly, recent research on syllogistic 
reasoning [18] has classified people, based on their external representations for 
reasoning, as using spatial or verbal representations. There were no differences in 
reasoning performance between the two groups. These results, and our own, leave 
open the possibility that some people may efficiently carry out some graph-based 
tasks using predominantly verbal resources. Verbal strategies might include, but are 
not limited to, remembering the absolute value of each of the referents in the premise 
displays or representing the relationship described in each of the premise graphs in 
linguistic form [see 25]. 

3.2 Representation by Analogy to Space - A Domain-General Ability? 

The question that we would like to consider for the remainder of this paper is whether 
the representation of abstract concepts by analogy to space is a domain-general 
strategy for abstract thought or whether, in the course of cognition, we build a number 
of similar but distinct and domain-specific spatial representations. The former 
position might be attributed to proponents of mental model theory [26] who assume 
that the same spatial representations underlie transitive reasoning about space and 
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time. Similarly the notion that spatial schemas [27] underlie many cognitive abilities 
seems to presuppose a domain-general representational ability. However, the contrary 
position appears to have been adopted particularly with respect to number cognition. 
For example, Dehaene and his colleagues [11] have claimed that we have evolved a 
domain-specific system located in the parietal cortex, for the spatial representation of 
number. They interpret evidence that sensitivity to number is present very early in 
development, and in other species, as supporting their argument that spatial 
representation for number is evolutionarily inherited. They further claim that the 
representation of number is abstract. This claim is based on the insensitivity of several 
of the signature effects in the literature to variations in stimulus input, as well as on 
the results of lesion and imaging studies. 

We are unconvinced by arguments for domain-specificity. One problem for the 
argument is our finding that roughly one third of the participants in our study did not 
display a positive score on the NSI. These participants were worse off when 
responding to high numbers with their right rather than their left hand. If an abstract 
spatial representation for number is domain-specific and evolutionarily determined 
then we might expect almost everyone to display the SNARC effect. This finding 
suggests, as has been acknowledged in the reasoning literature [16; 26], that different 
spatial or verbal representational strategies may be used for the accomplishment of 
high-level cognitive tasks such as number comparison. In addition, we observed a 
positive correlation between the number of errors made by participants on the task 
and the NSI index (r = .38, p < .02). That is, a large SNARC effect is associated with 
a higher error-rate on the task. We are unsure as to why an evolved and domain- 
specific representation for number would result in error. It is perhaps, more likely that 
some people use a general spatial strategy for representing number and that this 
strategy, whilst useful in many contexts, may not be appropriate for some tasks. 

A second potential problem for the domain-specificity argument is the overlap that 
exists between effects found in the literature on numerical cognition and those found 
in the literature on relational reasoning. For example, the finding that it is easier to 
compare two numbers the further apart they are in magnitude has a direct analogue in 
the reasoning literature. That is, the further apart are two referents in a spatial 
representation of some relational premises, the faster people are to verify a conclusion 
concerning them [28]. Given this overlap, we were unsurprised by the association we 
observed between performance on the indices of a spatial representational strategy 
taken from the graph and number tasks. Although not everyone appears to use spatial 
representation for graphical reasoning and number judgment, the tendency to do so on 
one task is associated with the tendency to also do so on the other. 

Of course, an association does not necessitate a common cause. Analysis of the 
processes involved in the graph and number tasks suggests that although they share a 
number of common features, such as a requirement that a spatial or verbal 
representation be constructed, maintained and interrogated, they differ in several 
respects. For example, a spatial representational strategy on the graphical reasoning 
tasks calls for a mapping between actual and representational space and reordering of 
referents [see 29]. Our GSI may have been particularly sensitive to these aspects of 
the task and the fact that they place concurrent demands on spatial storage and 
processing may explain the association we observed between complex spatial span 
and performance on the GSI. Intuitively, the SNARC task often appears to call for the 
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inhibition of one response over another particularly when participants must respond to 
small numbers with their right hand. Scores on the NSI may partly reflect the ability 
to inhibit one response in favour of another. In addition, the tasks may place very 
different demands on spatial storage and processing resources, which may explain 
why, when we control for complex spatial span the association between performance 
on the spatial indices is still relatively strong. In short, the tasks are sufficiently 
complex and different in order for there to be difficulties in interpreting the statistical 
association observed between them. 

Despite the complexities of interpretation that we have outlined above, we 
speculate that there are two possible explanations for the association between the 
graph and number tasks. Recently, Dehaene and colleagues [30] have suggested that 
the processing of number has three components, each of which relies on separate 
parietal circuitry. One of these components is domain-general and is responsible for 
attentional orientation onto a variety of spatial dimensions. Perhaps this attentional 
process is involved in both the graph and number tasks. Its involvement in both tasks 
would account for the association we observed between them. Another possibility is 
that a general magnitude system underlies much of our spatial and temporal cognition 
[31]. This system may partly support both graphical reasoning and number 
comparisons. Interestingly, the first of these explanations preserves Dehaene's 
domain-specificity argument as the tripartite system that he suggests includes a 
parietal circuit specifically dedicated to domain-specific representation of number. 
The second explanation does not. As our data does not differentiate between these 
points of view, further work will be required to tease them apart. 

3.3 Conclusions 

Our results suggest that some people represent information presented graphically by 
analogy to space. They appear to build spatial representations of graphical 
information and the time it takes them to reorder information in premises presented 
graphically is associated with their spatial capacity. We also observed that the size of 
the SNARC effect in number cognition is associated with people's tendency to reorder 
information in graphical premises. This led us to consider whether people possess just 
one mechanism that allows them to represent abstract concepts by analogy to space or 
whether they possess a variety of domain-specific representational abilities. Although 
the data that we have reported here bears on this issue, additional experimental work 
is required to answer the question. 
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Abstract. In recent studies of graphical dialogue, the level of commu- 
nicative interaction has been identified as an important influence on 
the form of graphical representations. Here, we report the results of a 
‘Pictionary-like’ concept drawing experiment which compares the con- 
tribution of repetition and level of interaction to changes in the form of 
graphical representations. In one version of the task, participants repeat- 
edly produce drawings of the same set of items. In the other participants 
produce drawings of different items. In both cases, when the level of com- 
municative interaction between the participants varies, the form of the 
representation produced by the pair also varies. These results suggest 
that three different processes are contributing to changes in graphical 
form in these tasks: practise, reduction and mutual-modification. We 
propose that the last of these, mutual-modifiability is important for the 
evolution of new conventions. 



1 Introduction 

Although drawing is often treated as a species of monologue, experimental stud- 
ies are providing evidence of parallels between verbal and graphical dialogue [1]. 
For example, participants in graphical exchanges match each other’s style of 
drawing more often than would be predicted by chance ([2]). This echoes the 
accommodation or ‘entrainment’ phenomena, which includes the matching of 
lexicon, syntax, and semantics, that have been identified for verbal dialogue [3]. 
Similarly, it has been shown that under some circumstances, patterns of graph- 
ical turn-taking emerge that are similar to those found in conversation [4]. 

The parallels between verbal and graphical dialogue suggest that graphical 
representations, like language, may be flexible and sensitive to local interactional 
context. Experiments on graphical dialogue tasks, in which pairs of subjects 
communicate by drawing, have provided evidence of several effects of interaction 
on the form of graphical representations (see [1] for an overview). In these tasks 
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one member of a pair, the director , draws a target item and the other member, 
the follower, tries to use this drawing to guess the identity of the original target. 
Participants are free to draw anything they like subject to the constraint that 
they do not use letters or numbers. 

Experiments have provided evidence that the form in which the target is rep- 
resented in these tasks is affected by whether one or both participants can draw 
- even though only the director knows the identity of the target. For example, 
in the Music Drawing task (described in [2]) where the target items are pieces 
of music, directors favour figurative drawings (e.g., faces, figures, landscapes, 
objects) when only one person can draw and more abstract, graph like, repre- 
sentations when both people can draw. In the Concept Drawing Task (described 
in [5]), where the targets are words such as: “television” and “homesick”, direc- 
tors produce smaller, less complex, drawings when both people can draw than 
when only one of them can. 

These findings raise the possibility of developing an account of the evolution 
of graphical conventions in interaction and its effects on the emergence of icons 
and symbols. One prerequisite for any such account is an analysis of the notions 
of ‘icon’ and ‘symbol’ and related representational concepts. The concern of the 
present paper is, however, primarily empirical. 

In the Music drawing task, items never repeat whereas in the Concept draw- 
ing task items always repeat. Prima facie repetition and interaction contribute, 
in different ways, to the changes in the forms of drawings observed in these stud- 
ies. Firstly, repetition per se can lead to changes in representational form. As 
Bartlett established [6], repeated production can promote a systematic evolution 
of drawings toward more schematic or symbolic representational forms. 

There role of interaction is also likely to be different for repeating and non- 
repeating items. Like the verbal definite reference tasks [7], pairs in the Concept 
Drawing task are engaged in refining the ‘graphical referring expressions’ for 
particular items. They ground some initial representation for an item and then 
abbreviate it to make it more efficient to use. Interaction in this case aids both 
the initial grounding process and the subsequent refinement and reduction. 

By contrast, in the Music Drawing task, the need to represent multiple in- 
stances creates pressure to develop a system of conventions that can generalise to 
new items. In this case interaction must underwrite attempts to co-ordinate on 
a system of conventions. This sort of co-ordination requires more than ground- 
ing, in addition to co-ordinated use of particular representations, it must support 
negotiation of the representational system or notation of which they form part. 
As we have argued elsewhere, this requires mechanisms of interaction that allow 
participants to localise and repair of breakdowns in understanding [2, 8] 

This paper reports an experimental study designed to investigate the relative 
influence of item repetition and level of interaction on drawings produced in the 
Concept Drawing Task. Our initial hypotheses were that: 

1. Repetition promotes reduction of graphical referring expressions. 

2. Interaction facilitates reduction through grounding. 

3. Non-repeating items promote richer forms of interaction 
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2 Method 

The present experiment uses a version of the concept drawing task ([5]). This 
task is a graphical analogue of the Tangram task that Clark and colleagues used 
to study the development of verbal referring expressions in conversation (see 

e-g-, ["])• 

In outline, subjects work in pairs and take one of two roles; director or fol- 
lower. The director has an ordered set of 12 target concepts. The list of targets is 
constructed so that there are four categories of confusable items, in the present 
study: games, moods and feelings, plants and animals and objects. The follower 
also has a list of the 12 target items, but in random order and with four dis- 
tractors added. So an example list for the follower might be,; ‘football’, ‘lazy’, 
‘rugby’ ‘bored’, ‘filling cabinet’, ‘confused’, ‘active’, ‘squirrel’, ‘cat’, ‘kangaroo’, 
‘safe’, ‘baseball’, ‘golf’, ‘tv’, ‘dog’, ‘photograph’. In each round, the director 
draws a picture of each target concept and the follower tries to identify the right 
item from their list. The director works through all twelve items in this way until 
their list is exhausted. The follower is allowed to select each item from their list 
only once in each round and the pair do not receive any direct feedback about 
whether their choice was correct. 

In the original Tangram studies, and in the more recent studies of graphical 
dialogue, the director and follower repeat the same set of target items over 
several rounds but in a new order each time. In order to investigate how level of 
interaction and repetition of items interact, the present experiment involved two 
manipulations of this basic set up. One was whether the same set of target items 
repeated on each round or if a new set of targets was presented on each round. 
The second was whether only the director or both the director and follower 
could draw on the whiteboard. Because of potential difficulties adjusting to the 
different whiteboard configurations, subjects only performed the task under one 
set of experimental conditions. This resulted in a two by two factorial design with 
item repetition (+/ — Repetition) and interaction (+/ — Blocking) as between 
subjects factors. 

2.1 Subjects 

Forty eight pairs of subjects were recruited from undergraduate and post- 
graduate students at Queen Mary, University of London. 

2.2 Materials 

The experimental white-board tool described in [9] was used to administer the 
Concept Drawing task and the experimental manipulations. The whiteboard 
consists of a basic drawing area as illustrated in Figure 1. This was configured 
differently for the two task roles. The director’s task bar, illustrated in Figure 2 
displays the target set of concepts, listed in the order they are to be drawn. The 
current target item is highlighted and any that have already been completed are 
greyed out. The follower’s task bar displays the same set of concepts, plus the 



Co-ordinating Conventions in Graphical Dialogue 289 



four additional distractor items, as response buttons in the task bar (Figure 1). 
The distractors are added to ensure that choice of the last item is not trivial. The 
response buttons are randomly ordered on each round and can only be used once 
after which they become greyed out. The follower also has an additional ‘Already 
Used’ button to cover cases in which they decided that they had already used 
the response button appropriate for the current trial on a previous trial. This 
avoids forcing followers into errors that could themselves have knock-on effects 
on subsequent trials. 

Once the follower has pressed a response button for an item, further drawing 
by the director is prevented. A dialogue box appears confirming that a choice 
has been made and once both the follower and director have pressed ‘okay’ they 
proceed to the next item. The whiteboard drawing area can also be configured, as 
appropriate, to ensure that directors but not followers can draw. The whiteboard 
records all drawing activity in log files that are processed to provide statistics 
for data analysis. 

The sets of concepts were drawn from a pool of 96 items made up of the four 
categories (games, moods and feelings, plants and animals, objects) with 24 items 
in each category. Six sets of 16 concept items (12 targets plus four distractors) 
were selected at random subject to the constraint that no item appeared twice 
and that each set had equal numbers of items from each category. 




Fig. 1. Follower’s Whiteboard Window 
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Fig. 2. Director’s Task Bar 



2.3 Procedure 

Pairs of subjects were assigned randomly to conditions and roles (director or 
follower). They were seated in opposite corners of an L-shaped room with no 
direct line of visual contact between them. They were given instructions on the 
use of the shared whiteboard followed by two practice trials. 

Pairs were instructed that on each round the drawer would work through 
the list of target items. They could draw anything they liked subject to the 
restriction than no letters of numbers were to be used. Followers could select 
their response by pressing a button but once they made a choice they would 
not be able to undo it. If they were sure they had already used the appropriate 
response they could select the ‘Already Used’ button. They were also explicitly 
instructed, as appropriate, that either only the director could draw or that both 
the director and follower could draw. Pairs in the repeating condition were not 
explicitly told whether items would repeat. On de-briefing most volunteered that 
they has seen the same items several times. 

The six sets of items were assigned to pairs so that each set appeared equally 
often in each condition i.e., one per pair in the repeating conditions. For the 
non-repeating pairs the order of presentation was counterbalanced so that each 
of the six sets of items occurred equally often in each round of the experiment. 
Order of conditions was rotated. 

3 Results 

The results are presented in three sections. First, the general findings for task 
performance for pairs. Second, the overall pattern of drawing activities. Finally, 
a more detailed analysis of the individual contributions of the director and fol- 
lower to the drawings. 



Task Performance. In order to provide a measure of the perceived difficulty 
of each item for directors, planning time -the time elapsed between the start of 
a trial and the first drawing activity- was calculated. This was analysed in a 2 by 
2 analysis of variance with Blocking and Repetition as between subjects factors. 
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Fig. 3. Example Drawings of “Hedgehog” in the Repeating conditions for each round 
Table 1. Response Time (msecs) 





+ Repetition 


(S.D.) 


— Repetition 


(S.D.) 


+ Blocking 
— Blocking 


11,937 

12,307 


(15,860) 

(18,378) 


17,406 

25,177 


(12,952) 

(23,265) 



This showed a simple main effect of Repetition (F^ 172 4)=93.09 p=0.00) 1 but no 
effect of Blocking and no interaction (F^ 172 4)=0.68, p=0.41 and F^ 1724 )=1.33, 
p=0.25 respectively). On average, directors took approximately twice as long to 
start a non-repeating, i.e. new, item (4.4 seconds) as they did to start drawing 
a repeating item (2.2 seconds). Planning time was unaffected by whether the 
follower could draw. 

Response time, the time between the onset of drawing and the follower mak- 
ing a choice, was also assessed in an analysis of variance. This showed simple 
main effects of Repetition (F( 1 i 1724 )=109.63 p=0.00), Blocking (F( lil724 )=21.60 
p=0.00) and a significant Repetition x Blocking interaction (^( 1 , 1724 ) =17.8 
p=0.00). As the means in Table 1 show, responses to non-repeating items were 
consistently slower than repeating items. If both participants could draw the 
response time for non-repeating items was further retarded, however this had no 
effect on repeating items. 

The accuracy of identifications of the target concepts, ignoring responses 
where the follower had selected “Already Used”, was assessed in an analysis of 
variance following the same structure as before. This showed no simple main ef- 
fects of Repetition or Blocking (F( 1 j 67 7)=0.65, p=0.42 and 677 )=1.33, p=0.25 
respectively) but did show a reliable two way interaction (F( li677 )=5.90, p=0.02). 
The data in Table 2 show that accuracy was high in all conditions. When pairs 



1 A criterion level of p < 0.05 was adopted for all statistical tests. 
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Fig. 4. Example drawings of “Hedgehog” in the NonRepeating conditions for each 
round 



had the ability to interact directly on the whiteboard they were more accurate 
with repeating items. When only the director could draw on the whiteboard they 
were more accurate with non-repeating items. 



Drawing Activities. To provide a measure of ‘ink’ the total number of pixels 
used was calculated from the whiteboard log files. Fig. 5 illustrates the trends in 
the amount of drawing across rounds in each condition. As participants become 
more experienced with the task stable differences between the conditions emerge. 
The pattern of activity in the last three rounds indicates an interaction between 
the manipulations of Blocking and Repetition. 

The reliability of this pattern was assessed in a two by two analysis of vari- 
ance, with Blocking and Repetition as between subjects factors. This showed 
a simple main effect of Repetition (F( 1 172 4 ) =22.91, p=0.00), no simple main ef- 
fect of Blocking (F(i. 1724) =0.23, p=0.63) and a significant Blocking x Repetition 
interaction (i 7 ( 1 , 17 24) =59.24, p=0.00). As Table 3 shows, where both participants 
can draw, non-repeating items generate approximately a third more drawing 
than repeating items. Where only the director can draw there is no difference in 
the amount of drawing produced for repeating and non-repeating items. 



Table 2. Percentage of Items Correctly Identified 



+ Repetition — Repetition 
+ Blocking 87% 89% 

- Blocking 92% 87% 
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Fig. 5. Amount of Drawing Across Rounds 



Following Fay et. al. [5] a psychophysical measure of visual complexity of the 
drawings was calculated based on the the algorithm described by Pelli et. al. 
([10]). An analysis of variance showed the same pattern of results as for number 
of pixels drawn. A simple main effect of Repetition (i 7 ’( 1 172 4 ) =24.82, p=0.00), 
no simple main effect of Blocking (^( 1 , 1724 ) =0.10, p=0.75) and a significant 
Blocking x Repetition interaction (F(i,i724)=41.72, p=0.00). Fig. 6 illustrates 
the interaction between the effects of Repetition and Blocking. Pairs who could 
interact directly on the whiteboard produce more complex drawings for non- 
repeating items than repeating items. Where they cannot interact directly, item 
Repetition has little effect on the visual complexity of the drawings produced. 

Comparison of Fig. 5 and Fig. 6 shows a very similar pattern of results 
for these two measures of the drawing activities and they correlate highly (r = 
0.94). This is due, in part, to some limitations in using the Pelli et. al. measure of 
complexity in this context. The technique used to compute it is most appropriate 
in situations where the drawings have significant filled areas. Time constraints 
and the drawing instruments involved in this task make it difficult for subjects 
to produce filled areas. It should also be noted that, that the algorithm does not 



Table 3. Mean Number of Pixels (‘Ink’) Used in Drawing 





+ Repetition (S.D.) 


— Repetition (S.D.) 


+ Blocking 
— Blocking 


4384 

3222 


(3813) 

(2326) 


3916 

5232 


(2138) 

(4504) 
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Fig. 6. Visual Complexity of Drawing Across Rounds 



differentiate between different colours used in a drawing. To avoid misleading 
connotations of the term ‘complexity’ we will refer only to the number of pixels 
drawn in the remainder of this paper. 

3.1 Individual Contributions 

Director’s Contribution. As Fig. 7 illustrates, directors became progressively 
faster in both the repeating conditions, spending almost identical amounts of 
time drawing on each round independently of whether the follower could draw 
or not (averages of 5208 and 5292 msec, respectively). 2 By contrast, drawing 
time for the non-repeating cases was sensitive to whether the follower could draw 
or not. 

An analysis of variance, following the same design as above, showed simple 
main effects of both Repetition (F'(i i i 724 ) = 100.42, p = 0.00) and Blocking 
(F(i i 724 ) = 30.98, p = 0.00) and a reliable Blocking x Repetition interaction 
(F’(i j i 724 ) = 43.62, p = 0.00). Although Blocking resulted in no change to the 
amount of time the director spent drawing repeated items, it did affect non- 
repeating items with directors taking reliably longer in the unblocked condition 
(averages 9,401 msecs unblocked, 6,382 msec, blocked). 

As Fig. 8 illustrates, the pattern of results for amount of drawing by directors, 
measured in terms of total line length, contrasts with the pattern of results for 
drawing time (Fig. 7). In the Blocked conditions, where only the director can 

2 The amount of time spent drawing was measured as the total amount of time that 
the pen was pressed on the screen drawing lines, i.e., excluding time spent erasing. 



Total Line Length (Pixels) Time Drawing (msec) 
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Fig. 9. Amount of Drawing by the Follower 



draw, the amount of drawing carried out for the repeating items converges that 
for the non-repeating items. 

An analysis of variance showed a reliable main effect of Repetition (f ( 1 , 1724 ) = 
8.70, p = 0.00) and Blocking (i 7 ) 1 , 1724 ) = 4.68, p = 0.03) and a reliable Block- 
ing x Repetition interaction (i 7 ) 1 . 1724 )= 42.40, p = 0.00). If both participants 
could contribute directors drew least for repeating items, whereas if only one 
participant could contribute directors drew least for non-repeating items. 



Follower’s Contribution. The configuration of the whiteboard ensured that 
followers could only draw in the unblocked conditions. Fig. 9 shows the trends 
in follower’s drawing activities, measured as total line length, in the two un- 
blocked conditions. For repeating items, followers drew progressively less over 
time whereas for non-repeating items they drew progressively more. 

An analysis of variance with Repetition as a single, between subjects factor 
shows a reliable difference in the overall amount of drawing (J 7, ( 1 , 862 ) = 44.08, p 
= 0.00) with followers in the non-repeat condition producing roughly six times 
as much drawing (mean = 389.47) as in the repeat condition (mean = 65.87). 

To provide an analysis of the spatial distribution of drawing activities, the 
whiteboard drawing area, 1024 by 768 pixels, was divided into an arbitrary 32 
by 24 grid (with each grid square being 32 by 32 pixels) and the number of grid 
squares in which both subjects drew at least once was counted. This provides 
an estimate of spatial overlap of drawing that partially corrects for amount of 
drawing. An analysis of variance with Repetition as a single between subjects 
factor showed a reliable main effect of Repetition (i 7, ( 1 862 )= 41.68, p = 0.00). 
Followers in the non-repeating condition were more likely to draw in the same 
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region as the director (mean = 3.3 squares) than in the repeating condition 
(mean = 0.18 squares). 

4 Discussion 

Repetition and interaction both influence the way participants used drawing to 
represent the target items. The data provide evidence of a number of different 
processes contributing to changes in the form and structure of the drawings 
produced in this task. 

The simplest effect is practise. In the Blocked conditions, directors’ drawings 
of repeated items were produced more quickly and using fewer lines relative to the 
non-repeated drawings. However, they did not use less drawing overall, measured 
in terms of line length and ‘ink’ used, and were not scored as less visually complex 
(although see the reservations above) than the drawings of non-repeating items. 
The divergence between savings in drawing time and amount of drawing suggests 
that, when items repeat, director’s become progressively better at executing the 
same drawing. This effect of practise is independent of interaction; director’s 
production of drawings for repeating items becomes progressively faster in both 
the Blocked and Unblocked conditions (c.f. Fig. 7). 

The results also demonstrate the influence of communicative interaction on 
the form of the drawings produced. As noted, if only the director can draw on 
the whiteboard there is no difference in the overall amount of drawing produced 
for repeating and non-repeating items. When both participants can draw there 
is a significant change in the amount of drawing, however, the pattern of change 
is different for repeating and non-repeating items. 

For the repeating items, directors produce reliably less drawing when the fol- 
lower can also draw. Although it is likely that followers’ ability to recognise the 
drawing for a repeating item improves over rounds, the advantage of interaction 
cannot be attributed solely to followers providing earlier feedback. This would 
predict shorter drawing times by the director and there is no reliable differ- 
ence in the director’s drawing time in the Blocked and Unblocked conditions for 
repeating items. Additionally, a follower could equally well signal earlier recog- 
nition in both conditions by pressing a response button sooner. It appears that 
the ability of directors to produce simpler drawings in the unblocked repeating 
condition is related to the particular kinds of feedback follower’s can provide on 
the whiteboard. 

For the non-repeating items the pattern is reversed. When both participants 
can draw on the whiteboard they draw more. This is true both for global mea- 
sures of what was drawn and for the individual drawing activities of the director 
and follower. In contrast to the repeating condition, the follower draws progres- 
sively more, as illustrated in Fig. 9, and is more likely to draw in the same region 
as the director. 

Inspection of the drawings produced in the Unblocked condition indicates 
that these differences in drawing activity correspond to different types of commu- 
nicative exchange in the repeating and non-repeating conditions. In the repeating 
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condition followers typically produce ticks, crosses, or question marks to signal 
their level of understanding of the director’s contribution. In the non-repeating 
case followers engage more directly with the production of the drawings. They 
circle and annotate parts of the director’s drawings to indicate what elements 
need clarification or elaboration and in some cases directly add to the drawings 
as they are produced. 

The differential effects of interaction on drawings of repeating and non- 
repeating items are, we propose, most readily understood in terms of differences 
between different kinds of graphical dialogue. In the baseline case, corresponding 
to the Blocked conditions, the only feedback directors’ receive from the follower 
is that they have made a selection. As the data show, this does not, in general, 
provide a sufficient basis for directors to adapt the representations they produce. 
They become more efficient at reproducing the same drawing, but an informal 
analysis of the drawings shows that the drawings do not change their form or 
type. 

When both participants can draw on the whiteboard they can engage in 
richer, multi-turn, exchanges and this leads to changes in the kinds of repre- 
sentations they produce. However, the type of exchange is conditioned by the 
communicative demands of the task. 

Where items repeat the interaction is analogous to an instructional dialogue. 
The director produces a drawing of the target item and the follower explicitly 
signals whether they understand, e.g., with a tick, or they require clarification, 
e.g., with a cross or a question mark. Once they have used these explicit sig- 
nals of understanding to establish the mutual-belief that they understand which 
description refers to which item, they are subsequently able to contract or ab- 
breviate repetitions of those descriptions without further negotiation (c.f. [11]). 
As a result, the amount of drawing reduces and the follower’s contributions tail 
off. 

In the non-repeating condition, pairs need to repeat the process of estab- 
lishing mutual-understanding for each new item. Although initially this involves 
accepting or rejecting the director’s drawings in the same way as for repeating 
items it appears to evolve into something analogous to a negotiation. Increas- 
ingly, followers collaborate directly on the production of the Drawing. In some 
cases by identifying elements of the director’s drawing that are problematic, in 
other cases by adding to it themselves. This process of mutual-modification leads 
to more extensive and more complex drawings and, arguably, provides better ev- 
idence of mutual-understanding. 

5 Conclusion 

These results highlight the interrelationship between the forms of representations 
and the kinds of interactions in which they are deployed. In studies of graphi- 
cal and verbal representation it is common to consider processes of production 
and comprehension in isolation; people produce representations and other peo- 
ple either succeed or fail in comprehending them -analogous to the baseline case 
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described above. Representational conventions are, however, more dynamic than 
this idealisation suggests. Importantly, people modify and adapt them in collab- 
oration through their communicative exchanges. 

In the experiment reported here we have distinguished between two pro- 
cesses through which representational conventions are adapted in graphical di- 
alogues. The first is the reduction or abbreviation of recurrent graphical re- 
ferring expressions (c.f. [5]) the second is adaptation and elaboration through 
mutual- modification. Both have analogues in verbal dialogue. The former pro- 
cess correlates with the reduction of referring expressions described by Clark and 
colleagues (e.g., [7, 11]), the latter with certain kinds of clarification and repair 
processes in conversation. 

We speculate that mutual-modification is central to the evolution of new 
symbols and new representational systems. The reduction of recurrent repre- 
sentations is, by definition, a conservative process that can support refinement 
of representations but not changes in their interpretation. For repeating items 
the problem is to arrive at the most efficient label for a item. However, the 
development and modification of new conventions requires processes that can 
sustain generalisations across multiple items. This entails being able to modify 
and adapt the semantics of the representational system. Mutual-modification, 
we propose, provides a basis for this by providing mechanisms through which 
individuals can co-ordinate their interpretations of both drawings as a whole and 
drawing elements. This could provide a foundation on which new conventions 
can be constructed. 
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Abstract. We conducted an eye-tracking study of mechanical problem solving 
from cross-sectional diagrams of devices. Response time, accuracy and eye 
movement data were collected and analyzed for 72 problem-solving episodes (9 
subjects solving 8 problems each). Results indicate that longer response times 
and visually attending to more components of a device do not necessarily lead 
to increased accuracy. However, more focus shifts, visually attending to 
components in the order of causal propagation, and longer durations of visual 
attention allocated to critical components of the devices appear to be 
characteristics that separate successful problem solvers from unsuccessful ones. 
These findings throw light on effective diagrammatic reasoning strategies, 
provide empirical support to a cognitive model of comprehension, and suggest 
ideas for the design of information displays that support causal reasoning. 



1 Introduction 

How people reason about and solve graphically presented problems drawn from 
visual, spatial and causal domains has been a topic of interest in diagrammatic 
reasoning research. This interest stems from the fact that certain characteristics of 
systems in such domains lend themselves to graphical representations in which the 
visual properties and spatial distribution of components aid the problem solver in 
directing his or her reasoning along paths of causal propagation. These characteristics 
are: (1) components of a system are spatially distributed; (2) systems are dynamic, i.e. 
components and their properties change over time; (3) system components causally 
interact with each other; (4) such interactions can be traced along chains of cause- 
effect relationships (which we call lines of action) that branch and merge in spatial 
and temporal dimensions; and (5) predicting the operation of a system requires 
reasoning from a given set of initial conditions to infer these causal chains of events. 
Reasoning about mechanical devices from cross-sectional diagrams is a case in point. 
Other examples of problems from visual, spatial and causal domains are circuit 
design, weather forecasting, emergency response coordination and military course-of- 
action planning. 

Understanding the cognitive processes underlying such reasoning, especially 
strategies that separate successful problem solvers from unsuccessful ones, can 
provide insights into the design of information displays that actively aid the problem 
solver and enhance his or her performance. In the context of a research program on 
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designing and evaluating such information displays (see Section 4 for details), we 
report on an experiment that investigated how people make predictions about the 
operation of a mechanical device, when given a labeled cross-sectional diagram of the 
device and an initial condition - specified as the behavior of one of the device's 
components. In addition to determining the accuracy of their answers, we measured 
their response times and collected data on their eye movements across the stimulus 
display. Our goal was to understand the relations among accuracy, response time and 
patterns of visual attention allocation across the display. 

The rest of this paper is organized as follows. In Section 2 we summarize earlier 
work on mechanical reasoning from diagrams that has a bearing on the present 
research. Section 3 describes the experiment and its results. In the final section, 
conclusions that can be drawn from the experiment's results, their implications and 
our future research are discussed. 



2 Related Research 

Larkin and Simon (1987) undertook a computational analysis of diagrammatic versus 
sentential representations, and described features of diagrams that aid reasoning. 
Extending this line of enquiry, Cheng (1996) proposed twelve functional roles of 
diagrams in problem solving: (1) showing spatial structure and organization, (2) 
capturing physical relations, (3) showing physical assembly, (4) defining and 
distinguishing variables, terms and components, (5) displaying values, (6) depicting 
states, (7) depicting state spaces, (8) encoding temporal sequences and processes, (9) 
abstracting process flow and control, (10) capturing laws, (11) doing computations, 
and (12) computation sequencing. These analyses suggest that diagrams can aid a 
problem solver by explicating and facilitating inferences about the components, 
structure, states and spatio-temporal sequences of causally connected events of 
systems in visual, causal and spatial domains. 

Other research has delved into details of diagrammatic reasoning. Hegarty (1992) 
provides an account, based on reaction time and eye-fixation data, of how people 
infer the motions of mechanical devices from diagrams. She found evidence for an 
incremental reasoning process: the device is decomposed into its components and 
their behaviors are mentally animated in the direction of causality. However, mental 
animation is constrained by working memory capacity, such that people are only able 
to mentally animate one or two component motions at a given time. Furthermore, the 
eye-fixation data indicated that mental animation is accompanied by inspection of the 
relevant parts of the diagram. 

In earlier research Narayanan, Suwa and Motoda (1994) investigated how people 
solved mechanical reasoning problems presented as diagrams. Analysis of subjects' 
verbal and gestural protocols supported the incremental reasoning model. It also 
suggested that shifts of the problem solver's focus from component to component 
were mediated to a great extent by the connectivity of components, internal 
visualization of component behaviors, propagation of causality and search for 
information in the diagram. 

They further analyzed the intermediate hypotheses (extracted from verbal 
protocols) of subjects who were reasoning about an “impossible” mechanical device 
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(see problem 6, Figure 1, later in the paper). In this device's operation, the causal 
chains of events branch and merge in the spatial dimension (within the device's 
physical structure) and the temporal dimension (events occur concurrently as well as 
sequentially). This analysis revealed that the trajectory of reasoning was mediated by 
the lines of action (Narayanan, Suwa & Motoda, 1995). However, instead of 
following a systematic strategy such as depth-first (traverse each branch fully before 
starting on another) or breadth-first (traverse all branches in an alternating fashion), 
subjects took a mixed approach with elements of both. They also retraced their 
reasoning paths multiple times, especially near the merging points of the lines of 
action. 

3 Experiment 

Prior research has thus identified several characteristics of mechanical reasoning from 
diagrams. Three important ones are decomposing the device into its components and 
attending to each individually, reasoning along the lines of action, and focus shifts 
mediated by several factors. In the experiment reported here we investigated whether 
successful and unsuccessful problem solvers could be separated in terms of these 
characteristics. 

Response time, accuracy and eye movement data were collected and analyzed for 
72 problem solving episodes: 9 subjects solving 8 problems of mechanical reasoning 
from cross-sectional and labeled diagrams. Components attended to, the direction of 
reasoning and focus shifts were determined from eye movement data. 

Our interpretation of eye movement data is based on the premise that fixations are 
likely indicators of what the problem solver is (or has been) thinking about. The 
assumption (called the eye-mind assumption) is that the locus of eye fixations 
corresponds to the information being processed by the cognitive system. The eye- 
mind assumption is supported by two independent lines of research. Just and 
Carpenter (1976) discuss extensive evidence supporting this assumption for goal- 
directed tasks that require information to be encoded and processed from the visual 
environment. The problem solving tasks in our experiment were goal-directed (“given 
the initial motion of one component, predict the motion of another component further 
downstream”). Information needed to carry out these tasks had to be encoded and 
processed from a visual stimulus displayed on a computer monitor. Additional 
evidence that eye movement traces carry information about cognitive processes that 
underlie mechanical reasoning from device diagrams appears in (Rozenblit, Spivey & 
Wojslawowicz, 1998). These researchers collected eye movement traces of subjects 
making predictions about mechanical devices presented as diagrams and gave the 
traces to independent raters. The traces alone were sufficient for the raters (who did 
not see the device diagrams) to reliably predict both the principal axes of orientation 
of the devices subjects saw, and whether the subjects solved each problem correctly. 



304 Daesub Yoon and N. Hari Narayanan 



3.1 Procedure 

Nine engineering graduate students volunteered to participate. They were 
compensated with a payment of $10 each. Each subject solved 8 problems. Each 
problem was displayed as a labeled cross-sectional diagram of a device with an 
accompanying question and possible answers (Figure 1 shows the 8 stimuli exactly as 
displayed to subjects). The first three problems involve simple mechanical devices. 
The fourth is a Rube Goldberg-like device for frying an egg. These four problems 
have been used in prior research (Narayanan, Suwa & Motoda, 1994). The fifth is a 
pulley system previously used by Hegarty in her experiments (1992). The sixth is an 
“impossible” problem, also used in prior research (Narayanan, Suwa & Motoda, 
1995). It involves branching and merging causal event chains, unlike the previous 
five problems. The seventh and eighth problems are about the flushing cistern, the 
most complex of all seven devices used in this experiment. The operation of this 
device also involves branching and merging causal event chains. Furthermore, in 
previous studies of comprehension of this device from interactive graphical 
presentations, it was found that while subjects were able to infer behaviors of 
components within each causal chain, they had difficulty integrating information 
between the two causal chains (Flegarty, Quilici, Narayanan, Holmquist & Moreno, 
1999). 

The experiment was conducted one subject at a time in an eye tracking laboratory 
equipped with a head-mounted eye tracker, eye tracking computer and a stimulus 
display computer. The eye tracker we used is the Eye Link model from SMI Inc. It 
consists of a headband, to which two infrared sources and cameras (one for each eye) 
are attached. It is a video-based eye tracker that detects pupil and corneal reflections 
from infrared illumination to compute screen coordinates of the subject's gaze point 
on the stimulus display monitor once every 4 milliseconds. The headband is attached 
by cable to a PC which functions as an experiment control station as well as carries 
out the necessary computations. This PC communicates with the stimulus display 
computer via an Ethernet link. The problems were presented as static pictures on the 
monitor of the stimulus display computer. Subjects sat in a high-backed chair, and 
viewed the problem on a 20-inch monitor mounted on a wall at eye level, at a 
distance of approximately 3 feet. The experimenter sat behind the subject and 
controlled the experiment through the eye-tracking computer. 

First, a subject solved three practice problems with piston, pulley and gear 
systems. The devices in the practice problems were much simpler than the devices in 
the experimental problems. A 10-minute break and calibration of the eye tracker 
followed. The actual experiment began by the subject clicking the left mouse button 
to display the first stimulus. When ready to make the prediction, the subject depressed 
a number key on the keyboard to indicate his or her answer. This automatically 
brought up the next stimulus. The key presses were used to compute and record 
response times and determine accuracy. The specific problem to be solved, in the 
form of a question, appeared as part of each stimulus. This text also specified the 
number keys corresponding to possible answers. Each problem had two or three 
possible answers, of which only one was correct. Eye movement data was collected 
and recorded for all problems. 
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Fig. 1. Eight experimental problems. The bar chart below each problem shows the mean 
duration (as a percentage of total time on task) of visual attention on each component of the 
device in that problem by successful (light bars) and unsuccessful (dark bars) subjects 
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Question 3: 

When gas at high pressure is 
pumped in through holeA, is piston 
going to oadllate up and down near 
the holeB? 

1. Yes 

2. Ho 

3. (an't say 

Press number key... 




holeA 



holeC 



piston 



nholeB 



holeB 
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Question 7: 

When handleA is 
pushed down, will 
objectB not move 
down below leveK? 

1. Yes 

2. Ho 

Press number key... 




The eighth problem showed the same device as in Problem 7, a flushing 
cistern, with the question: “After answering Question 7, will the water level 
rise until objectB reaches levelD?” 
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3.2 Results 

We computed five dependent measures: accuracy, response time, coverage, number 
of focus shifts and (causal) order. Accuracy is a nominal variable categorizing a 
subject's answer to a problem as correct or incorrect. The correct answers to the 8 
problems are: Question 1 - No; Question 2 - Yes; Question 3 - Can't say; Question 4 
- Yes; Question 5 - No; Question 6 - No; Question 7 - No; Question 8 - Yes. The 
accuracies of the nine subjects were 62.5% (the first subject was correct in 5 out of 
the 8 problems), 50% (4/8), 50% (4/8), 62.5% (5/8), 12.5% (1/8), 87.5% (7/8), 50% 
(4/8), 62.5% (5/8) and 75% (6/8) respectively. The accuracies for each of the 8 
problems were 44.4% (4 subjects out of 9 were correct), 22.2% (2/9), 22.2% (2/9), 
88.9% (8/9), 66.7% (6/9), 88.9% (8/9), 66.7% (6/9), and 55.6% (5/9) respectively. 
Response times were automatically recorded by the stimulus display computer using 
subjects' key presses. 

In each device diagram, individual components were delineated by bounding 
boxes. Once the co-ordinates of these bounding boxes were determined, we could 
associate fixations with components, and calculate the duration of each subject's gaze 
on each component as a percentage of the total time the subject spent on that problem. 
Coverage for a problem and a subject was computed as the percentage of components 
of the corresponding device that attracted at least one fixation by the subject. 

Using the bounding box technique, we could also determine a subject's shifts of 
visual focus from component to component in each problem from raw eye movement 
data. We derived S, an ordered sequence of device components that a subject attended 
to during a session, for each subject and each problem. This was done by aggregating 
consecutive fixations inside the bounding box of a component and detecting when 
another component was fixated upon. S begins with the first device component 
attended to, and ends with the last component attended to before the subject pressed a 
number key indicating his or her solution. Number of focus shifts by a subject for a 
problem is then the number of transitions in S, i.e., (the size of S) - 1. Fixations on 
the question and blank display regions were ignored in this analysis. 

In the sequence S, if component j appears immediately after component i, and if i 
can causally influence j according to the lines of action in the device, then i-j 
represents a causal link in the sequence S. Consecutive causal links represent causal 
subsequences of S. The length of a causal subsequence is the number of causal links 
in it. The measure order is defined as the sum of squares of the lengths of causal 
subsequences in S. This captures the total number of correct cause-effect pairs of 
components that a subject considered, weighted by the length of unbroken lines of 
action that the subject considered (i.e. if subjects A and B both considered the same 
number of causal links, but if A looked at longer causal link chains than B, the value 
of order would be higher for A than B). 

Analyses were conducted to compare response times, coverage, focus shifts and 
order of successful problem solvers with those of unsuccessful problem solvers. The 
goal was to discover whether successful problem solvers could be characterized by 
longer response times, higher coverage, more focus shifts or larger values of order. 
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Fig. 2. These bar charts show, for each problem (on the x-axis), the means (on the y-axis) of 
response time (seconds), coverage, number of focus shifts and order for successful (light bars) 
and unsuccessful (dark bars) subjects 

Statistical analyses (t-tests) were carried out to compare the mean values of 
response time, coverage, number of focus shifts and order. Two groups of problem 
solving episodes from the total of 72 (9 subjects X 8 problems) were compared: a 
group of 41 in which the subjects provided the correct answer and another group of 
31 in which the subjects were wrong. We found no significant difference between the 
mean response times of successful and unsuccessful subjects across all 8 problems, 
t(71) = 1.138, p < 0.26. The top-left bar chart in Figure 2 shows the mean response 
times of successful and unsuccessful subjects for each problem. For problems 1, 2, 4 
and 8 subjects who provided the correct answer had lower mean response times than 
subjects who were wrong. For problems 3, 5, 6 and 7 this reversed. 

There was no significant difference between the mean percentage of components 
that successful and unsuccessful subjects attended to across all 8 problems, t(71) = - 
1.414, p < 0.16. The top-right bar chart in Figure 2 shows the mean coverage of 
successful and unsuccessful subjects for each problem. For problems 1, 3, 4, 5, 6 and 
7 subjects who provided the correct answer had lower mean component coverage 
than subjects who were wrong. For problems 2 and 8 successful subjects exhibited a 
higher mean coverage than unsuccessful ones. 
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We found marginal significance in the difference between the means of the 
number of focus shifts of successful and unsuccessful subjects across all 8 problems, 
t(71) = 1.792, p < 0.08. The bottom-left bar chart in Figure 2 shows the mean number 
of focus shifts of successful and unsuccessful subjects for each problem. For 
problems 2, 4 and 5 subjects who provided the correct answer exhibited lower 
number of focus shifts on average than subjects who were wrong. For problems 1, 3, 
6, 7 and 8 successful subjects made more focus shifts on average than unsuccessful 
ones. 

Flowever, t-test comparisons of the mean value of order indicated that subjects 
who were accurate considered significantly more causal connections and longer lines 
of action than subjects who provided wrong answers did, across all 8 problems, t(71) 
= 2.934, p < 0.0045. The bottom-right bar chart in Figure 2 shows the mean values of 
order for successful and unsuccessful subjects for each problem. For all problems 
except the fifth, successful subjects had higher mean values of order than subjects 
who were inaccurate. 

Next, we investigated the relation between gazing on particular components and 
accuracy. Even though no significant differences between successful and 
unsuccessful problem solvers were discovered in terms of their response times and 
the percentage of components of each device they visually attended to during 
problem solving, we explored this further by calculating the gaze durations of each 
subject on each component of each device as a percentage of that subject's response 
time for that device. From this data we calculated the mean gaze duration percentages 
of successful and unsuccessful problem solvers for each component of the eight 
problems. These are shown as bar charts appearing under each of the 8 problems in 
Figure 1. Note that gaze duration percentages are computed and shown in the bar 
charts for all major components of a device, even if only some of these components 
are labeled in the device's diagram that subjects saw. For example, the label “nholeB” 
in bar charts corresponding to problems 2 and 3 refers to the spring and surrounding 
area inside the cylinder below holeB in the corresponding two devices, though this 
label does not appear in the device diagrams. 

The bar charts in Figure 1 illustrate one consistent pattern across all problems. 
Successful problem solvers are differentiated from unsuccessful ones by the fact that 
they spent more time, as a percentage of total time to solve the problem, on average 
on components that are critical to solving the problem. For problem 1, rodB is the 
important component. For problems 2 and 3, it is holeB. For problem 4, the last 
components in the line of action - pulleyB, knife and egg - are the critical ones. For 
problem 5 the various string segments are the critical components. For problem 6, its 
“impossibility” becomes evident when one considers the combination of three 
circular gears: gearRight, gearLeft and gearCenter. For problem 7 the critical 
components are in the region of the small arm of the siphon pipe (components joint 1 
and joint2). For correctly answering the eighth problem's question, the eventual 
closing of the inlet valve (component pipe3) is a critical inference. As can be seen in 
Figure 1, successful problem solvers spent a higher percentage of time fixating on 
these components. Just and Carpenter (1976) suggest that gaze duration provides a 
measure of the time spent processing the corresponding symbol. Therefore it is 
reasonable to conclude that a longer duration (relative to time on task) of visual 
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attention allocated to critical components of the devices is a characteristic that 
separates successful problem solvers from unsuccessful ones. 

4 Discussion: Implications and Future Research 

This paper described an experiment on diagrammatic reasoning about mechanical 
devices in which we investigated characteristics that differentiate successful and 
unsuccessful problem solvers. One outcome measure (accuracy) and five process 
measures (response time, coverage, number of focus shifts, causal order of processing 
and relative gaze durations on individual components), four of which were derived 
from eye movements, were analyzed to examine whether successful problem solvers 
could be characterized in these terms. 

The results support several conclusions regarding effective diagrammatic 
reasoning strategies. Spending more time on the task and visually attending to more 
components do not necessarily lead to success in mechanical reasoning from device 
diagrams. On the other hand, considering more component pairs that are causally 
related and attending to longer causal chains of components can lead to better 
accuracy. Concentrating on critical components for relatively longer durations also 
appears to improve accuracy in problem solving. Increased shifting of one's focus of 
visual attention from component to component during problem solving is marginally 
positively related to accuracy, but clearly which components are attended to and in 
which order are more significant predictors. 

These findings suggest that in training novices to qualitatively and successfully 
reason about mechanical systems from diagrams, it is important for instruction to 
focus on developing skills of determining and following causal chains of events in the 
operation of the device, and identifying components that are critical to solving the 
problem at hand. An important characteristic of the mechanical domain is that it 
consists of dynamic systems with spatially distributed components that causally 
interact and give rise to event chains that branch and merge in spatial and temporal 
dimensions. Several other domains share this characteristic. Therefore we postulate 
that the pedagogical implications of our findings extend to these domains, such as 
meteorology, as well. 

Another implication of this research is that it has provided new empirical evidence 
supporting a previously reported cognitive process model of causal system 
comprehension from text and diagrams. Narayanan and Hegarty describe this 
cognitive model of multimodal comprehension (Narayanan & Hegarty, 1998), based 
on which they argue that an information display with the following six characteristics 
is likely to enhance comprehension (Narayanan & Hegarty, 2002). It should aid the 
viewer in decomposing the system being described, enable the viewer to invoke 
relevant prior knowledge, point out the common referents of external representations 
in different modalities, explain domain laws that govern the system, explicate the 
lines of action in the operation of the system, and encourage mental animation. While 
they discuss empirical support for some of these characteristics, the efficacy of a 
display that supports reasoning along the lines of action has not been experimentally 
investigated. Our finding that successful problem solvers exhibit significantly higher 
values of the measure order, indicating consideration of more causal links along 
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longer lines of action, does indeed suggest that a display that facilitates reasoning 
along the lines of action can enhance comprehension. 

Understanding the cognitive processes underlying causal reasoning from visual 
displays, especially strategies that separate successful problem solvers from 
unsuccessful ones, can provide insights into the design of displays that actively aid 
the problem solver and improve his or her problem solving performance. For 
instance, Grant and Spivey (2002) studied Karl Duncker's radiation problem, by first 
showing a diagram of the problem to subjects and determining the part of the diagram 
that received most attention from successful problem solvers. In a subsequent 
experiment, they attracted subjects' attention to that part through a blinking action, 
and found that merely attracting the problem solver's attention to that part of the 
display dramatically improved accuracy. But can the display regions that are critical, 
and the optimal order of visual attention allocation, for a problem be determined a 
priori? 

Our research suggests that, for problems drawn from domains with the five 
characteristics listed in the introduction of this paper, lines of action in the operation 
of the system provide spatial pathways of optimal visual attention allocation. 
Furthermore, components at the merge points in these pathways and components 
most closely associated with the problem being solved are critical, and allocating 
more attention to these can improve accuracy. This leads naturally to a research 
program on displays that track and guide a problem solver's visual attention, and 
provide relevant information at the right time and in the right place, to support causal 
reasoning. We term displays that thus exploit the trajectory of a viewer's focus shifts 
during problem solving “Reactive Information Displays” (Narayanan & Yoon, 2003). 
To be effective, such displays must have knowledge about the system/domain that is 
being displayed, knowledge about the problem solving task that the user is engaged 
in, knowledge regarding an applicable problem solving model and knowledge about 
the trajectory of the user's attention shifts. 

An introduction to Reactive Information Displays and an empirical study of four 
reactive strategies are presented in (Narayanan & Yoon, 2003). This study showed 
that a display that guides the user's visual attention along paths of causal propagation 
while demonstrating potential behaviors of individual components significantly 
improved the accuracy of mechanical problem solving. In another experiment (Yoon 
and Narayanan, 2004), we discovered that when subjects were first shown a 
mechanical reasoning problem and then given a second problem on the same device, 
but without any diagram, about half had fixations on the blank region of the display 
that the device diagram occupied in the first problem. While these subjects were no 
more accurate than those who did not exhibit any eye movements on the blank part of 
the display, they looked at more “virtual” components and had significantly higher 
values of the measure order, indicating that their eye movements on the blank region 
were systematic, along the lines of action (i.e. examining causally related chains of 
components). This suggests that Reactive Information Displays may be particularly 
useful for this kind of users, whose eye movements reflect cognitive processes even 
in the absence of an external stimulus. In current work, we are evaluating several 
reactive display strategies in the domain of algorithmic problem solving. Pursuing 
these lines of enquiry will remain the focus of our future research. 
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Abstract. Graphical communications, such as dialogues using maps, 
drawings, or pictures, provide people with two independent modalities 
through which they can interact with each other. In such conversation, 
graphical interaction can be both sequential and parallel, affected by 
the activity-dependent constraints imposed by the task performed in the 
interaction (Umata, Shimojima, Katagiri, and Swoboda (2003)). In this 
paper, we compare the patterns of speech and graphical interaction in 
collaborative problem-solving tasks. The amount of speech overlap did 
not show a significant difference among four task conditions, although 
graphical overlaps did. This shows that both resource-dependent and 
activity-dependent constraints play a significant role in determining the 
interaction organization. 



1 Introduction 

Communication is an interactive activity. People have to organize their interac- 
tion to exchange information effectively. One example of such interaction orga- 
nization is turn-taking in conversation: People typically take sequential speech 
turns in verbal communication (Sacks et al. (1974), Clark (1996)). 

Because of the collaborative nature of communicative acts, turn-taking in- 
volves a wide variety of factors such as sociological principles, limitation of hu- 
man cognitive capacities, and so on. One possible factor for sequential turns in 
speech is that the resource characteristics of media involved in the activity dic- 
tate the style of interaction: The media used for speech affords only one person’s 
speech sounds at a time. Sacks et al. (1974) regard verbal turns as an economic 
resource, distributed to conversation participants according to turn organization 
rules. According to them, one of the main effects of these turn organization rules 
is the sequentiality of utterances, namely, the almost complete absence of par- 
allel or simultaneous utterances from conversation. “Overwhelmingly, one party 
talks at a time,” they observe. 

On the other hand, there are observations that suggest that communication 
media with different characteristics from speech may show different interaction 
organization patterns. Brennan (1990) reported that non-verbal signals, such as 
the movement of a cursor, did not observe the turn organization rules. Condon 
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(1971) even claims that gestures and other bodily movements of participants in 
conversation exhibit mutual synchrony with each other. These findings appar- 
ently suggest that verbal and non-verbal media in interaction do not necessarily 
adhere to the same turn organization rules and that these differences may derive 
from the contrasting resource characteristics of the media used in the interaction. 

Drawing is one communication media that has quite different characteristics 
from speech. First, drawing is persistent whereas speech is not. Drawing remains 
unless erased, whereas speech dissipates right after it occurs. People must com- 
prehend speech in real time, and its content must be grounded without much 
delay. Drawing, however, can be understood later than when it is actually drawn. 
This may lead to different grounding patterns: Drawing activities may not have 
to be grounded incrementally, whereas verbal utterances must. 

Second, drawing has a much wider band with than speech. Because of the 
limitations of human auditory processing, two or more utterances made at the 
same time are difficult to understand separately. Thus, it is unusual for more 
than one message to be simultaneously transmitted through speech, except when 
they are acoustically similar so as to make so-called “sync talk.” On the contrary, 
two or more drawing operations can occur at the same time without interfering 
with each other, provided that there is a sufficiently large drawing surface. 

Drawing interaction organization has been studied mainly in the Human 
Computer Interaction field in the context of computer-supported collaborative 
work, and there have been two conflicting assumptions. Stefik et al. (1987) report 
on a computational facility where multiple people engaged in spoken dialogue 
can simultaneously “draw” on a shared video display. The design idea was that 
this functionality would give everyone an equal chance to express themselves 
freely, without somebody ruling the collaborative activity. Thus, a key assump- 
tion of their work is that people will use simultaneous drawing for communica- 
tion, given the facility to do so. This assumption was also partially supported 
by Whittakker et al. (1991), who observed six groups of subjects using a vir- 
tual shared whiteboard for long-distance communication. They found frequent 
instances of simultaneous drawing on the whiteboard, especially when diagrams 
and tables were drawn or modified. 

Tatar et al. (1991), however, paint a contradictory picture in their observa- 
tional study of two groups of subjects using the same system that Stefik et al. 
(1987) described. They found that despite the potential for simultaneous draw- 
ing, the subject interactions were largely sequential, trying to take drawing turns 
as well as speaking turns. They explained this fact by claiming that human com- 
munication generally proceeds incrementally, grounding expressed information 
one by one in real time. According to them, the facility for simultaneous drawing 
is based on an incorrect model of human communication, redundant at best and 
troublesome at worst. 

To approach this problem, Umata et al. (2003) provided yet another view 
based on the activity-dependent constraints imposed by the task performed in 
the interaction. The analyses show that sequential structure is mandatory in 
drawing either when the drawing reflects the dependency among the information 
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to be expressed or when the drawing process itself reflects the proceedings of 
a target event. 

There remains, however, one more question that needs to be clarified: Is 
speech turn organization intact from those activity-dependent constraints? If so, 
we can safely argue that speech is already restricted by resource-dependent con- 
straints, and this thus serves as support for the traditional view on speech turns. 
On the contrary, if speech turns are shown to be affected by activity-dependent 
constraints, speech also might have the possibility of parallel turns in spite of 
its resource characteristics. The purpose of this paper is to see whether activity- 
dependent constraints affect speech turn organization or not. We examine speech 
turn organization patterns in the four tasks analyzed in Umata et al. (2003), and 
show that speech turns are not affected by activity-dependent constraints. 

2 Drawing Turns and Speech Turns 

As we have seen in the previous section, the sequentiality of speech turns has 
been attributed to the resource characteristics of speech, namely non-persistence 
and restricted bandwidth. The assumption is that we cannot comprehend two 
spoken utterances at the same time because of the bandwidth limitation, while 
we cannot delay comprehending one utterance until later because of the non- 
persistent character. Drawing, on the contrary, has a quite different nature in 
regard to these assumptions, and may have potential for parallel turn organiza- 
tion. There have been seemingly contradictory observations about drawing turn 
organization; one is that drawing turns can be parallel, the other is that they 
cannot be. Umata et al. (2003) suggested that there is yet another kind of con- 
straint based on the activities people are engaged in. According to this view, 
sequential structure is mandatory in drawing in some cases but not in others. 

Sequentiality Constraints 

1. Drawing interaction occurs in sequential turns under either of the 
following conditions: 

(a) Information Dependency Condition: When there is a dependency 
among the information to be expressed by drawing; 

(b) Event Alignment Condition: When drawing operations them- 
selves are used as expressions of the proceedings of target events. 

2. Sequential turns are not mandatory in drawing activities when nei- 
ther condition holds (and persistence and certain bandwidth of draw- 
ing are provided.) 

The rationale for the information dependency condition is the intuition that 
when one piece of information depends on another, the grounding of the former 
piece of information is more efficient after the grounding of the latter has been 
completed. This should be the case whether a particular speaker is explaining the 
logical dependency in question to her partners or all participants are following 
the logical steps together. 
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Event alignment is a strategy for expressing the unfolding of an event dynam- 
ically, using the process of drawing itself as a representation. For example, when 
you are reporting on how you spent a day in a town with a map, you might draw 
a line that shows the route you actually took on the map. In doing so, you are 
aligning the drawing event with the walking event to express the latter dynami- 
cally. Our hypothesis is that simultaneous drawing is unlikely while this strategy 
of event alignment is employed. Under this condition, the movement or process 
of drawing is the main carrier of information. The trace of drawing has only 
a subsidiary informational role. Thus, in this particular use of drawing, its per- 
sistency is largely irrelevant. The message must be comprehended and grounded 
in real time, and bandwidth afforded by the drawing surface becomes irrelevant. 
This requirement effectively prohibits the occurrence of any other simultaneous 
drawing. 

An analysis on the corpus gathered from collaborative problem-solving tasks 
demonstrates that these two activity-dependent constraints can override the re- 
source characteristics of the media, thereby enforcing a sequential turn organi- 
zation similar to those observed in verbal interactions (Umata et al. (2003)). 

The question is, do these activity-dependent constraints have no effect on 
turn organization in speech? Previous studies observed that the speech turns 
should be sequential; however, no study has compared speech turn organizations 
in different task settings. If speech turn organization is also significantly affected 
by activity-dependent constraints, speech may also have some potential of par- 
allel turn organization, but not as much as graphics turn organization has. Or, 
it may be the case that the resource-dependent constraints hypothesis is wrong 
after all. 

In the following part of this paper, we will analyze the speech corpus from 
the collaborative problem-solving task data gathered in Umata et al. (2003). We 
will compare the speech turn organization patterns in different task settings to 
see if activity-dependent constraints affect speech turn organization or not. 



3 Method 

An experiment in which subjects were asked to communicate graphically was 
conducted to examine the effect of the two factors presented above on their 
interaction organization. In these experiments, 24 pairs of subjects were asked 
to work collaboratively on four problem-solving tasks using virtual whiteboards. 

3.1 Experimental Setting 

In the experiments reported here, two subjects collaboratively worked on four 
different problem-solving tasks. All the subjects were recruited from local uni- 
versities and were paid a small honorarium for their participation. The subjects 
were seated in separate, soundproof rooms and worked together in pairs using 
a shared virtual whiteboard (50 inches) and a full duplex audio connection. 
The subjects were video-taped during the experiment. They also wore cap-like 
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eye-tracking devices that provided data indicating their eye-gaze positions. The 
order in which the tasks were presented was balanced between the 24 pairs so 
that the presentation order would not have an affect on the results. The time 
limit for each task was six minutes. 

At the start of each task, an initial diagram was shown on both of the sub- 
ject’s whiteboards and the subjects were then free to speak to one another and to 
draw and erase on the whiteboard. The only limitation to this drawing activity 
was that they could not erase or occlude the initial diagram. All drawing activ- 
ity on the whiteboard was performed with a hand-held stylus directly onto the 
screen, and any writing or erasing by one participant appeared simultaneously 
on their partner’s whiteboard. The stylus controlled the position of the mouse 
pointer and, when not drawing, the position of both subject’s mouse pointers 
was displayed on their partner’s whiteboard. 

3.2 Tasks 

Deduction Task with an Event Answer (le). A logical reasoning problem 
with a correct answer. The problem asks that the subjects describe the arrange- 
ment of people around a table and the order in which people sit down. This 
seating arrangement and order must satisfy some restrictions (e.g., “The fifth 
person to sit is located on the left-hand side of person B.”). A circle representing 
a round table was shown on the whiteboards at the start of the task. This task 
has strong informational dependency and strong event alignment. 



Deduction Task with a State Answer (Is). A logical reasoning problem 
with a correct answer asking that the subjects design a seating arrangement 
satisfying some restrictions (e.g., “S cannot sit next to M.”). A circle representing 
a round table was shown on the whiteboards at the start of the task. This task 
has strong informational dependency and loose event alignment. 



Design Task with an Event Answer (2e). A task with an open-ended 
answer, asking subjects to make an excursion itinerary based on a given town 
map. A complete town map was shown on the whiteboards at the start of the 
task. This task has weak informational dependency and strong event alignment. 



Design Task with a State Answer (2s). A task with an open-ended answer, 
asking the subjects to design a town layout to their own likings. An incomplete 
town map was shown on the whiteboards at the start of the task. This task has 
weak informational dependency and loose event alignment. 



3.3 Data 

During each task, all drawing, erasing, and mouse movements by each subject 
were recorded in a data file. Using this data, the amount of simultaneous drawing 
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was calculated as the percentage of the total time spent drawing simultaneously 
of the total time either subject spent drawing (i.e., the sum of the time intervals 
in which both subjects drew simultaneously divided by the the sum of the time 
intervals in which at least one of the pairs drew on the whiteboard). Speech 
was recorded with video-data and labeled by hand. As with the drawing data, 
the amount of simultaneous speech was calculated as the percentage of the total 
time spent talking simultaneously of the total time either subject talked. 

4 Results 

4.1 The Effects of Activity-Dependent Constraints 

As shown in Figure 1, the proportion of simultaneous utterance time to total 
utterance time demonstrated no significant difference in each condition. This 
data was entered into a 2 x 2 Analysis of Variance (ANOVA). Both problem 
type (deduction and design) and by solution type (state and event) were treated 
as within subject factors. No effects were found for either the problem type or 
solution type, and analyses showed no interaction Fs < 1. 

This is quite different from the pattern of simultaneous drawing shown in 
Figure 2 (Umata et al. (2003)). The proportion of simultaneous drawing time to 
total drawing time was the largest in the design state (2s) condition. This data 
was entered into a 2 x 2 ANOVA. Both problem type (deduction and design) 
and by solution type (state and event) were treated as within subject factors. 
Analyses revealed a main effect of problem type F(1 ,23) = 16.33, p < .001 and 
solution type F(l,23) = 16.20, p < .001, which was qualified by an interaction 
F(l,23) = 6.61, p < .01. This interaction was caused by an effect of solution 
type on the design task F(l,23) = 42.88, p < .001. No effects were found for the 
solution type on the deduction task F(l,23) = 2.13, p < 1 nor for the problem 
type on the event solution F < 1. 

Thus, it was shown that activity-dependent constraints have no effect on 
speech turn organization, whereas simultaneous drawing tends to be blocked 
when the task has either strong informational dependency or tight event align- 
ment, or both. 

4.2 Distribution of Simultaneous Turns 

Table 1 shows the mean duration of overlapping utterances by subject in all 
conditions. Table 2 shows the mean duration of simultaneous drawings by subject 
in all conditions. Because of the difference between each modality’s characters, 
we cannnot draw anything with certainty from just a simple comparison of the 
percentages of simultaneous utterances and those of simulaneous drawings. It 
is worth noting, however, that the amount of simultaneous drawings is even 
smaller than that of simultaneous utterances when they are affected by the 
activity-dependent constraints. 



Dependent variable 
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Fig. 1. Amount of simultaneous utterances 



Table 1. Distribution of simultaneous utterances 





Deduction (1) 


Design (2) 


Event (e) 


12.1% 


12.1% 


State (s) 


12.4% 


11.3% 



Table 2. Distribution of simultaneous utterances 





Deduction (1) 


Design (2) 


Event (e) 


2.7% 


6.0% 


State (s) 


5.2% 


15.2% 
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Fig. 2. Amount of simultaneous drawings 

4.3 Other Findings: Cross-Modal Overlaps 

The multimodal communication settings of this experiment provided the subjects 
with yet another possible pattern of simultaneous interaction, namely drawing- 
speech overlaps. These two modalities are expected to hardly interfere with each 
other when they overlap, because of the independent communication channels 
and the persistence of drawings. The ratio of drawing-speech overlap time divided 
by the total drawing time throughout the tasks was 39.3%. A simple comparison 
is not appropriate in this case either, but it is worth noting that this number is 
much higher than that of simultaneous speech and simultaneous drawings. 

Because of the cross-modal nature, there were two types of drawing-speech 
overlaps; self-speech overlaps and partner’s speech overlaps. As is shown in Fig- 
ure 3, self-speech overlaps are much more frequent than partner’s speech over- 
laps. This data was entered into an ANOVA. Analysis revealed a main effect of 
overlap type F(l, 191) = 59.981, p < 0.01. 

Thus, it was shown that overlapping self-drawings with verbal utterances is 
easier than overlapping partner’s drawings. 
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0% 20% 40% 60% 80% 100% 



Fig. 3. Distribution of cross-modal overlaps 



5 Discussions 

We observed that speech turn organization is intact from those activity- 
dependent constraints that affect drawing turn organization patterns. This is 
a supporting result for the traditional account, which attributes the sequential 
speech turn organization to the resource characteristics of media. This suggests 
that speech turn organization is already restricted by resource-dependent con- 
straints, and activity-dependent constraints cause no additional effect. 

Although simply directly comparing the frequencies of simultaneous utter- 
ances with those of simultaneous drawings does not lead to any conclusion with 
certainty, the relatively low frequency of simultaneous drawings under the effect 
of activity-dependent conditions is still difficult to interpret. 

One possible cause is that drawing turns are restricted by yet other factors 
in the experiment. The subjects were told that their problem-solving processes 
would be video-taped, and that their answers would be evaluated through the 
recorded processes. The subjects tended to show their final results on the white- 
boards they were working on. Many pairs actually straightened up their draw- 
ings after agreeing on their answers. In such circumstances, drawing was not only 
a communication medium but also the purpose of the task itself, whereas speech 
served solely as a communication medium. The subjects often asked permission 
from their partners to draw something on the whiteboard before they actually 
drew it. 

Whittaker et al. (1991) also observed similar phenomena through the ex- 
amination of shared whiteboard communication with and without the addition 
of a speech channel. They found that permanent media such as a whiteboard 
provides users with space for constructing shared data structures around which 
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they can organize their activity. With the addition of a speech channel, peo- 
ple used the whiteboards to construct shared data structures that made up the 
CONTENT of the communication, while speech was used for coordinating the 
PROCESS of communication. The researchers provided a brain-storming task 
and a calendar coordination task for the users, and it is quite likely that the 
whiteboards were used for recording their conclusions and that the drawings 
were not only a communication media but also the aim of the task. 

Utterances coordinating drawing activities are typically found in tasks with 
sequentiality constraints. Figure 4 is a snapshot from the deductive state task 
(Is). Subjects A and B have just agreed to fix M’s seat first, and A suggests 
“M’s seat should be ... here, right?” drawing the sign M. Then, B gives verbal 
acknowledgement, “Yes.” Here, H’s utterance serves as a signal for his drawing 
activity. 

Such verbal coordinations of drawing activities are also found in our design 
state task (2s), where simultaneous drawings are the most frequent. Figure 5 
shows a snapshot of the drawing interaction that took place between two collab- 
orators working on the design state task (2s) . The upper row and the lower row 
indicate the drawing behaviors of subjects A and B , respectively. A verbally sug- 
gested to make a botanical garden in the woods. B answered, “Well, it’s a nice 
idea,” and then started drawing a woods icon right after asking permission to 
draw a woods icon on the left-hand side of the whiteboard. A, who almost began 
drawing a woods icon on the right-hand of the whiteboard, stopped and pulled 
her hand away from the whiteboard. She then produced verbal acknowledgement 
of B ' s drawing action. 

However, simultaneous drawings occur in the design state task (2s) even 
where verbal coordinations of drawing occur. Figure 6 shows one such case. 
The subjects A and B agreed to divide the design task into two sub-tasks, the 
design of a station plaza and that of a park. Then, A says “Station,” and B says 
“I’ll make forest,” before starting their drawing activities. Here, they verbally 
coordinate their simultaneous drawing activity. 

It is quite likely that such verbal coordinations reinforce the sequential turn 
organization in drawings and work as restricting factors against simultaneous 
drawings, especially in the cases with sequentiality constraints. The mechanism 
of the coordinations across different modalities is, however, not clear. More work 
is required to demonstrate how the two modalities interact and when verbal 
coordinations of drawing activities happen. 

6 Conclusions 

So far we have analyzed speech and graphics turn organizations based on the data 
of collaborative task solving settings. We found that these activity-dependent 
constraints do not impact speech turn organization, but do affect drawing turn 
organization. This serves as support for the traditional view on speech turn or- 
ganization, which attributes sequentiality of speech turns to resource-dependent 
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Fig. 4. Sequential drawing interaction coordinated verbally (1) 




Fig. 5. Sequential drawing interaction coordinated verbally 





Speech and Graphical Interaction in Multimodal Communication 327 




Fig. 6. Parallel drawing interaction coordinated verbally 



conditions, demonstrating that speech turns are already restricted by their re- 
source characteristics. 

Simultaneous drawings are generally not as frequent as one would expect, 
however. They are even less frequent when they are affected by activity- 
dependent constraints. One possible cause for this phenomenon is the functional 
difference between these two modalities in the task settings. Drawings were not 
only a communication medium but also the aim of the task itself, while speech 
served solely as a communication medium. This could be regarded as another 
possible factor that restricts the parallel turn organization in collaborative work 
situations. Further studies are required to elucidate the mechanism of verbal 
coordinations of drawing activities. 

Drawing-speech overlaps are comparatively frequent, and self-overlaps are 
much more frequent than partner’s overlaps. This suggests the possibility that 
turns in communication tend to be kept across speech and drawing modalities. 

Overall, these findings indicate that turn organization in communication can 
be either sequential or parallel, depending on the resource characteristics of 
modality, the activity people engage in, and possibly the function the communi- 
cation media bears. 
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Abstract. We discuss the use of diagrams in support of empirical re- 
search. In particular, we try to find out if an analysis of a diagram that 
represents the results from a series of qualitative studies, can be used to 
formulate subsequent (quantitative or qualitative) research. 



Introduction [1] focus on the diagrammatic representation of research re- 
sults from qualitative research. In the same paper, an attempt was made to 
detect those parts of a diagram that suggest subsequent quantitative research. 
In the present paper, we will elaborate on this issue. We focus on the question: 
Which substructures of a point- and- arrow diagram representing research results 
are suited for the formulation of new research hypotheses? 

Example As an example, research initialized at the Department of Social 
Medicine at the VU University medical center will be used. The study con- 
centrated on the decision to start or forego artificial administration of fluids and 
food (AAFF) to nursing home residents with dementia. The results of the project 
have been described in several articles. 

In [3], the objective was to get more insight in the practice of starting or 
forgoing AAFF. The deterioration caused by dementia turned out to be a normal 
process in nursing home residents and seemed to be no ground to initiate AAFF. 
However, an event of acute illness, like pneumonia, is influential in initiating 
AAFF. [2] focus on the decision making process proper: they found that there 
are several groups of participants: (i) the nursing home physician, (ii) the nursing 
staff and (iii) the family of the resident, each having a distinct role in the 
decision making process, (a) Since patients that are demented are unable to give 
informed consent to medical treatment, the family is formally considered to play 
an important role in the reconstruction of the wishes of the patient; (b) The role 
of the nursing home physician resembles that of a ‘stage manager’ since he or 
she has an overriding say in the final decision. Generally, physicians try to create 
the broadest possible basis for decision making by involving nurses and family 
throughout . 
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Fig. 1. Result diagram of the decision making process. Ri\ Nursing home 
resident (indices indicate time points) NHP: Nursing home physician 



The decision on AAFF. Figure 1 describes the decision making process from 
the perspective of the patient according to the findings indicated in the previous 
section. (Note that the diagram starts at a later, critical phase in the process 
of dementia. In some cases, the decision on AAFF may induce that the patient 
improves and enters an earlier stage.) 

The figure incorporates the finding that the nursing home physician has a very 
active role in assessing the condition of the patient and later on, in the decision 
to either withhold or administer of fluids and food. For the assessment of the 
patient’s condition the NHP relies on the practical experience of the nursing 
staff. When the final decision on AAFF has to be taken, feelings and opinions 
of the patient’s family become important. 



Which parts of a result diagram suggest new research? We take the 
result diagram (Diag. 1) as a starting point. 

Generally speaking, there are five aspects of a result diagram that are particu- 
larly influencial in the generation of new research hypotheses: 
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Context. Most diagrams are meaningful only in a context In Figure 1, ‘decide’ 
is the essential process of the study, but this can only be understood from the 
context . 



Inheritance. Diagrams often specify a single element of an enclosing set. In 
our case, the basic entity in the diagram is ‘nursing home resident’ {Rf). If the 
group to which the resident belongs is not homogeneous but consist of subgroups 
with different attributes, the diagram may have to include attributes of these 
subgroups. 



Additional information may be needed for specification. Those parts of 
a diagram that cannot be fully specified since insufficient information is available, 
may suggest further investigation. 



Temporal processes. Another issue suggested by Figure 1 is the temporal 
order suggested in the diagram by the indexes of the resident boxes. It may be 
of great interest to study the development suggested by the diagram in greater 
detail. 



Branching. The lower part of Fig. 1 describes the two decisions that can be 
taken and their possible effect (Note that there are four possible transitions from 
resident’s state four (R 4 ) to resident’s state five (-R 5 ). In the diagram this is rep- 
resented by a so-called Chooser (the little circle). 

Although AAFF is commonly con- 
sidered to decrease the patient’s risk 
of dying, in some cases this can not be 
prevented: when withholding AAFF, 
the possibility that the patient will 
die, is not excluded. On the other 
hand, in case AAFF is withheld but 
the patient survives, this may have ir- 
reversible effects on the patient. 

Therefore, a subsequent quantitative 
study could be directed to assess the 
prevalence of these discrepant outcomes and the consequences for the decision 
process. In the present case, the following research question can be formulated: 
What is the incidence of nursing home patients after a decision has been taken 
on AAFF and feeding has been started or forgone to either die or recover ? In 
particular, how often do unexpected developments occur)? 

In general, choosers in a result diagram often suggest subsequent research to be 
able to place incidences on the different branches. 
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Discussion and Conclusion The message of the present article is that dia- 
grams can help to represent the results of a research effort and show the structure 
of the information gained from the subject matter field. 

Although it is easy to derive research hypotheses from a diagram that rep- 
resents the results of qualitative or quantitative research, it is much more dif- 
ficult to find a corresponding research design that offers some guarantee that 
the question can be answered. When restricted to quantitative research, much of 
the process is common knowledge between researchers, although different fields 
may use different research paradigms. For qualitative research the situation is 
somewhat less pronounced. 

Using diagramming techniques to represent the results of a study, may pro- 
vide additional insight to the original research question. In particular, the struc- 
ture of the subject matter domain and developmental aspects can be clearly 
depicted. 

The main message of this paper has been that when results from qualitative 
research are representable in a diagram, it is easy to derive new research hy- 
potheses from it. This may be particularly useful when qualitative methods are 
applied at the beginning of the research cycle (for instance, in a pilot phase) and 
the subsequent research will be quantitative. In this way, a diagram can serve 
to bridge the often mentioned ‘gap’ between the two research forms. 
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Abstract. A pilot study was designed to investigate the plausibility of 
construing fictive motion from function lines in Cartesian graphs. Participants 
(n=18) were presented with a series of lines graphs and required to judge which 
of two lines of expressed the greatest rate of change in the value of Y. Some of 
the graphs had arrows pointing in a direction of the line that was either 
consistent (consistent condition) or inconsistent (inconsistent condition) with 
change progressing from the origin of the horizontal axis. Graphs in the neutral 
condition had no arrows. It was hypothesized that if users construe fictive 
motion when interpreting change of function lines then (a) inconsistent arrows 
should detrimentally interfere with the judgments and (b) consistent arrows 
should facilitate the judgments. Results were as predicted. Response times in 
inconsistent trials were slower than the neutral condition and for consistent 
trials response times were faster than the neutral condition. 



1 Introduction 

Studies of graphical cognition are particularly well suited to investigate issues 
concerning the embodiment of conceptual representations. Two issues are of 
particular interest. The boundaries between conceptual and perceptual representations 
[1] and the extent to which image-schemas derived from basic sensory-motor 
experiences are employed to reason metaphorically about abstract domains [3,4]. 

Graphical representations can be analyzed as metaphoric blends comprising of 
image-schematic mappings. One transparent example is the function line of Cartesian 
graphs that represent metaphorical trajectories [3]. In conceptual metaphor theory this 
is the Source-Path-Goal (SPG) image-schema. The image-schema represents the trace 
of a trajector moving along a trajectory. According to Lakoff & Nunez [3] graphical 
function lines are sometimes conceptualized in terns of what Talmy refers to as 
‘fictive motion' [5]. Fictive motion refers to cases where motion is construed of static 
referents, as implied by expressions like “this road goes all the way to the Diagrams 
Conference”. 

A pilot study was designed to investigate the plausibility of the SPG schema and 
the construal of fictive motion as mechanism for understanding change in function 
lines. Previous research suggests constraints on the way different axes of a graph are 
conceptualized [2], Informal analysis suggests that one role of the horizontal axes is 
to reference the unidirectional change of the function line progressing from its origin. 
It was further reasoned that if users construe fictive motion when interpreting change 
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in function lines of graphs then arrows that are inconsistent with this direction of 
change should detrimentally interfere with judgments whereas arrows that are 
consistent should facilitate judgments. 



2 Method 

Participants were eighteen postgraduate students of the Department of Informatics at 
the University of Sussex. Participants were presented with a series of lines graphs on 
a computer monitor. Each graph consisted of a green and red function line. 
Participants were required to judge which line expressed the greatest rate of change in 
the value of Y (vertical axes) as a function of the value of X (horizontal axes). There 
were three experimental conditions (Neutral, Consistent, and Inconsistent) each of 
which corresponding to a different level of the independent variable. In the Consistent 
trials, the graphs presented had arrows located on the function lines pointed in the 
direction of the line consistent with change progressing from the origin of the 
horizontal axes. Inconsistent trials were the same as consistent trials except that 
arrows pointed in the opposite direction. In neutral trials arrows were absent. Figure 1 
shows the same graph under the three experimental conditions. Responses were 
recorded by pressing a red or green key on the computer keyboard. Subjects were 
instructed to respond as quickly and accurately a possible. Before each experimental 
session began, participants completed a short pencil and paper task that involved rate 
of change judgments followed by a set of practice trials. 

Graphs were designed in a systematic way to counterbalance for the effects of 
different visual dimensions of the function lines. Half of the graphs expressed positive 
relations and half expressed negative relations. The same set of graphs was used for 
each experimental condition and differed only with the form of the arrows for the 
level of the independent variable. 



3 Results 

One participant was excluded because of a high number of incorrect responses. 
Correct responses for the remaining sample were above 95%. Mean responses times 
for correct responses were computed for each participant over each of the three 
conditions. As predicted the response time in milliseconds for consistent trials 
(M=1456, SD=405) was faster than neutral trials (M=1503, SD=400) and this was 
faster than inconsistent trials (M=1555, SD=429). The data was analyzed with a 
repeated measure ANOVA which revealed a statistically significant main effect [F (2, 
15)=5.64, P<.0 1 ] . One tailed t-tests were used for pair wise comparison. The 
difference between inconsistent and consistent trials was statistically significant (t 
(16)=3.58, P<.001), as was the difference between inconsistent and neutral trials (t 
(16)=L91, P<.05). However, the difference between consistent and neutral trials did 
not reach statistical significance (t (16)=1.41, P=.09). 
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Fig. 1 . Experimental conditions from left to right: consistent, neutral and inconsistent 



4 Discussion 

The results of the experiment are in line with the predictions. Trends in the data 
suggest consistent arrows facilitated judgments and inconsistent arrows inhibited 
them. One interpretation of these effects is in terms of the construal of fictive motion. 
The graph lines express metaphoric trajectories that have direction, a beginning and 
an end. We suggest that mapping successive changes in a line to an internal 
representation that codes dimensions of change requires a specific fonn of cognitive 
mechanism and representation. Higher-level mechanisms involved in motion 
interpretation and schemas that represent motion are likely candidates. 

If this interpretation is correct then the following provides a reasonable 
explanation. We propose that the direction cues interfere with the binding of a default 
direction filler to the SPG schema in the context of graph function lines (this is not 
meant to imply a propositional list structure). We imagine binding slots to fillers as a 
constructive process arising through competition. When users comprehend the 
consistent trials the internal representation of the direction cue becomes conflated 
with the direction slot, this facilitates the binding process of the default filler for 
direction. When users view the inconsistent trials the internal representation of the 
direction cue conflicts with the default binding, this extra competition needs to be 
resolved which slows the process of binding the default slot. 

This explanation is tentative. We do not yet have good reason to rule out that the 
effect may derive from arbitrary reading conventions. One possibility is that the 
direction cues indirectly interfere with procedures for directing eye-movements in 
graphs and these procedures have no impact on the way the lines are conceptualized. 
These are issues for future studies. 
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Abstract. Students studied materials about the human heart and circulatory 
system using either (a) text only, (b) text with simple diagrams, or (c) text with 
detailed diagrams. During learning, students self-explained [1] the materials. 
Explanations were transcribed, separated into propositions, and analyzed 
according to the type of learning process they represented. Results 
demonstrated that diagrams promoted inference generation but did not affect 
other learning processes (such as elaboration or comprehension monitoring). 
However, only simple diagrams promoted generation of inferences that 
integrated domain information. Results indicate that diagrams may be useful 
because they guide the learner to engage in the cognitive processes required for 
deep understanding. 



1 Rationale and Experimental Approach 

In recent years there has been growing interest and enthusiasm regarding the addition 
of visual resources to educational materials. The overall conclusion of previous 
research on text with pictures is that the addition of visual material improves students' 
memories and understanding [2]. In addition, Mayer and his colleagues have 
identified a number of principles that describe situations in which multimedia 
materials are most effective [3]. Although such principles are useful in identifying 
conditions that can maximize multimedia benefits, there is little evidence as to why 
diagrams improve memory and learning. The goals of this research were to determine: 
(a) if the comprehension processes of learners using text and diagrams were different 
from learners using text only, and (b) whether diagram complexity would influence 
comprehension processes. 

A simple text about the heart and circulatory system was used alone or in 
conjunction with a series of diagrams from one of two types: diagrams that were 
simplified to emphasize the functional aspects of the heart (see Figure la), or more 
detailed diagrams that depicted the correct anatomy of the heart in addition to its 
functional aspects (see Figure lb). 

During learning, students self-explained the materials [1] and the resulting verbal 
protocols were separated into a series of complex propositions [4]. Two raters scored 
propositions as: paraphrases (statements that reflected information from the current 
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materials), elaborations (connections to prior knowledge), monitoring statements 
(utterances that demonstrated comprehension monitoring), or self-explanation 
inferences (statements that went beyond current materials or that integrated current 
and previous material). Because self-explanation inferences included multiple types 
of inferences, they also were separated into subcategories: path inferences (inferences 
that concerned the path of blood through the heart and circulatory system), nonpath 
inferences (inferences concerning content other than the path of blood), and 
integration inferences (inferences that connected current and previous information; 
these included path or nonpath inferences that integrated information). Interrater 
reliability ranged between .70 and .99 (M= .88). Additional work assessed students' 
mental models of the domain and the relationship between inference generation and 
mental model development [5]. Students drew and explained the heart and circulatory 
system before and after learning; these materials allowed mental model assessment 
and were scored for accuracy according to criteria from Chi et al. [1]. 




Fig. 1 . (la) A simple diagram emphasizing functional relations, (lb) A detailed diagram 
preserving anatomical accuracy 



2 Results 

As seen in Fig. 2., students using either type of diagram made significantly more self- 
explanation inferences than students using text only (Fp, 22 )= 9.3, p < .01), but did not 
differ in the number of other propositions uttered (F< 1). 

Students using either diagram also made significantly more path (Fp, 22 ) = 5.0 , p < 
.02) and nonpath inferences (F) 2 , 22 )= 8.3, p < .01) than students who used text only; 
however, only path inferences were tied to formation of the correct mental model of 
the domain. Students who formed the correct mental model of the domain made 
significantly more path inferences during learning than students who failed to fonn 
the correct model (Fp, 23) = 5.0, p < .04). Path inferences may be related to mental 
animation processes; diagram cues could prompt mental animation and allow the 
student to draw path inferences. Current results suggest diagrams promote inference 
generation and some inferences support mental model development. 
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Finally, integration inferences demonstrated an interesting pattern. Students who 
saw simple diagrams generated more integration inferences (F (2j 22 ) = 4.9, p < .02) than 
students using text only (Mdifference = 17.5, LSD = 11.9 , P < . 01 ) and also tended to 
generate more of these inferences than students using detailed diagrams (Mdifference 
= 11.0, LSD = 11.5, p = .06). As with path inferences, students who formed the 
correct mental model of the domain generated significantly more integration 
inferences than students who failed to form the correct model (F^ 23 ) = 6.1 , p < .03). 
Thus, not all diagrams support comprehension processes equally. 



s 


200 
180 - 


"O 

£ 

ro 


160 - 


q3 

c 

CD 


140 - 


O 

</) 

C 


120 - 


O 


100 - 


O 

Q. 

O 

CL 


80 - 


X 

0 


60 - 


Q. 

E 

0 


40 - 


(J 


20 - 



□ T ext O nly 




r///i Simple Diagrams 
■■■ Detailed Diagrams 



_i_ 




Other Statem ents Self-Explanation Inferences 

Type of Complex Proposition 



Fig. 2. Mean number (+Standard Error) of other statements and self-explanation inferences 
generated by students in each experimental condition 



3 Conclusion 

This research suggests that diagrams are effective when they prompt learners to 
engage in the cognitive processes necessary for deep understanding. Results also 
demonstrated that differences in diagram representation can affect comprehension 
processes. In this situation, simple diagrams most effectively guided learners. 
However, one must be careful not to misinterpret this finding. In other situations, 
different types of diagrams or visual cues may be effective if they successfully prompt 
inference generation without adversely affecting other processes. 
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Abstract. TRACS: Tool for Research on Adaptive Cognitive Strategies, is a 
new family of card games played with a special deck. Each card in the deck is a 
double-sided diagram, where the back gives a clue to the front. Compared to 
standard card games, this clue/truth structure makes TRACS more tractable to 
theoretical investigations in the lab and more typical of practical situations in 
the world. Here I present the design of the deck and discuss some research 
results. 



1 Introduction 

Card games are useful for studying human judgment and decision making because 
they simulate the probabilistic and dynamic conditions of “naturalistic” (real world) 
situations. Unfortunately, with a standard deck of 52 cards, most games are too 
complex for analytical or numerical solution, and this makes it difficult to establish 
normative benchmarks for cognitive performance. Furthermore, with no information 
on one side (back) and all information on the other side (front), standard playing cards 
fail to capture the clue/truth structure of many real world problems. 

TRACS (Tool for Research on Adaptive Cognitive Strategies) [1] is a new family 
of games played with a special deck of double-sided cards (Fig. 1). The deck has a 
non-uniform distribution of six clue/truth (back/front) card types, and this structure 
allows players to infer the likely truth (front) from a given clue (back). 

Compared to standard games played with single-sided cards, TRACS offers 
advantages of theoretical rigor and practical relevance. With respect to rigor, the 
backs of the cards provide partial information to constrain the possible game states, 
and this makes the games more tractable to mathematical analysis of optimal 
solutions. For example, a face down card in a standard deck can be any one of 52 
cards; but in TRACS the card can be one of only six types, and the front (truth) is 
further constrained by the back (clue). With respect to relevance, the back/front 
(clue/truth) structure of the TRACS cards simplistically simulates the practical 
problem of inferring naturalistic truths from probabilistic clues. This is a basic 
problem in many applications, such as military intelligence (target identification from 
radar image) and medical diagnosis (tissue identification from x-ray image). 

With this novel structure of the deck, TRACS offers a useful blend of rigor and 
relevance for research on human judgment and decision making. Below I provide 
examples of a rigorous investigation and a relevant application. 
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2 Discussion 



The basic problem in playing TRACS is that the probabilistic distribution of card 
types (Fig. 1) changes in a dynamic fashion as cards are turned over to reveal their 
fronts. This is similar to the problem posed by standard card games like Blackjack, 
except that with standard cards the backs of the cards provide no information to help 
one make a decision, e.g., to draw a card (or not), or to choose one card over another. 

Many games can be played with the TRACS cards [1]. The simplest is a solitaire 
game called Straight TRACS, where on each turn the player is presented with a face- 
up card (color) flanked by two face-down cards (black shapes). The object is to turn 
over the black shape (left or right card) that is most likely to match the color of the 
card in the middle. The challenge is to count cards and update odds as the deck is 
depleted in play. For example, at the start of the game (with a full deck), a triangle is 
more likely than a circle to turn out Red, i.e., by odds of 3:2 as illustrated by the 
diagram on the front of each Red card (Fig. 1). But later in the game, after some 
triangles and circles have been turned over and removed from play, a circle may be 
more likely than a triangle to turn out Red. 

Initial experiments on Straight TRACS [2], [3] showed that human subjects are 
quite limited in their ability to count cards and update odds, even after much practice 
in the task. For example, Fig. 2. plots the probabilistic judgments reported by subjects 
(N=45) for one card type (squares) in one game (11 turns). The plot shows that 
subjects are “anchored” to the baseline (initial) odds and they make rather sluggish 
“adjustments” compared to the actual odds. The plot also shows the results of a 
computational (probabilistic) model [4] that simulates this anchoring and adjustment 
behavior. The model compares well to the data, both in mean response (line) and 
standard deviation (bars). This is an example of how TRACS offers rigor - in 
laboratory investigations to model human judgment and decision making. 
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Fig. 1 . The TRACS cards (Copyright 2002 by K. Bums). The double-sided deck contains 
24 clue/truth (back/front) cards. The front of each card (Red or Blue) illustrates the distribution 
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Fig. 2. Probability (% Red) vs. turn (#) for one card type (squares). Model (gray) against data 
(black). Lines show mean; bars show standard deviation. Dotted line shows actual probability 

Subsequent experiments were performed on a game called Spy TRACS [5], which 
is similar to Straight TRACS except that the player does not have to count cards. 
Instead, the player is given the “deck odds” for each turn, along with “spy odds” from 
an independent source (i.e., a simulated spy). The object is to combine the deck odds 
and spy odds (in a task of Bayesian inference) to estimate the “final odds” as a basis 
for choosing a card to turn. Experiments on Spy TRACS offer insight into cognitive 
“conservatism” in Bayesian inference [6], and this insight was used to design a 
diagram called “Bayesian Boxes” that helps correct conservatism [5]. This is an 
example of how TRACS offers relevance - for practical applications to improve 
human judgment and decision making. 

Additional experiments are currently being planned for a multi-player game called 
Poker TRACS [1]. In this game, the diagnoses are more complex because information 
is conveyed by intentional actions (when bets are made and raised) as well as physical 
events (when cards are dealt and turned). Similarly, the decisions are more complex 
because they involve chips (in betting) as well as cards (in drawing), and because they 
require projections of future game states as well as assessments of current game 
states. These cognitive challenges make the game relevant to practical problems in the 
world [7], and the double-sided deck makes the game tractable for rigorous research 
in the lab. 
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Abstract. The ‘graphicacy’ of student programmers was investigated 
using several cognitive tasks designed to assess ER knowledge repre- 
sentation at the perceptual, semantic and output levels of the cogni- 
tive system. A large corpus of external representations (ERs) was used 
as stimuli. The question ‘How domain-specific is the ER knowledge 
of programmers?’ was addressed. Results showed that performance for 
programming-specific ER. forms was equal to or slightly better than 
performance for non-specific ERs on the decision, naming and func- 
tional knowledge tasks, but not the categorisation task. Surprisingly, tree 
and network diagrams were particularly poorly named and categorised. 
Across the ER tasks, performance was found to be highest for textual 
ERs, lists, maps and notations (more ubiquitous, ‘everyday’ ER forms). 
Decision task performance was generally good across ER types indicating 
that participants were able to recognise the visual form of a wide range 
of ERs at a perceptual level. In general, the patterns of performance 
seem to be consistent with those described for the cognitive processing 
of visual objects. 



1 Introduction 

A range of ERs were used as stimuli for a range of cognitive tasks: ER decision, 
categorisation, functional knowledge and naming (Cox, in preparation). The aim 
was to assess ER knowledge representation at different levels of the cognitive 
system using an approach informed by picture and object recognition and naming 
research [2], 

2 Method 

Participants were 17 computer science undergraduates (14 male). The study was 
done in the context of a larger study [3] . Each participant completed 4 ER tasks 
and several programming tasks ( e.g . program debugging 1 ). 

1 The programming task results are not reported here due to lack of space. For further 
details, see: www.cogs.susx.ac.uk/projects/crusade/ 
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Fig. 1 . Examples from the ER. corpus 



ER task stimuli consisted of 90 ERs including maps, set diagrams, text, 
lists, tables, graphs & charts, trees, node & arc, plans, notations & symbols, 
pictures/ illustrations, scientific diagrams, & icons (Figure 1). Twenty-two ‘fake’ 
or chimeric diagrams were also included in the case of the decision task. A wide 
range of ERs was employed so that, in future work, the corpus contains items 
suitable for use with a variety of subject samples. 

The ER task sequence was decision , categorisation , functional knowledge 
and naming. ER presentation order was randomised across subjects. The de- 
cision task was a visual recognition task requiring real/fake decisions. The cate- 
gorisation task assessed semantic knowledge of ERs - subjects categorised each 
representation as ‘graph or chart’, or ‘icon/logo’, ‘map’, etc. In the functional 
knowledge task, subjects were asked ‘What is this ER’s function’? An example 
of one the (12) response options is ‘Shows patterns and/or relationships of data 
at a point in time’. In the naming task, for each ER, subjects chose a name 
from a list. Examples: ‘venn diagram’, ‘timetable’, ‘scatterplot’, ‘Gantt chart’, 
‘entitity relation (ER) diagram’, etc. 

3 Results and Discussion 

Figure 2 shows that participants made fewest errors on the decision task, fol- 
lowed by the first semantic task (categorisation) , then naming (an output task) , 
with poorest performance on the other semantic task (functional knowledge). 
Inter-task correlations were also computed; only the categorisation and naming 
tasks were significantly correlated (r=.56, p<.05). The relatively good decision 
task performance indicated that the participants were able to recognise the visual 
form of a wide range of ERs at a perceptual level. Close-to-chance (50%) deci- 
sion performance, however, was observed for graphs and charts, icons and logos 
and fakes. The categorisation and functional knowledge tasks measure different 
aspects of a person’s semantic knowledge of ERs. The categorisation task can be 
performed on the basis of relatively broad ER features and attributes ( ie . per- 
ceptual classification. In contrast, the functional knowledge task involves more 
subordinate levels of knowledge, such as mental representations of ER ‘applica- 
bility conditions’. Categorisation performance was lowest for network diagrams, 
set diagrams and tree diagrams. 
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Fig. 2. ER task performance, by ER category. Mean proportion of correct responses, 
averaged across ERs-within-categories and participants. Programming-specific ERs 
(left), non domain-specific (right) 

Participants showed least difficulty in categorising ‘maps’ and ‘plans’. Maps 
are a class of representations that have been argued to be a ‘basic’ category 
in the organisation of ERs in semantic memory [1], Lists, notations and tex- 
tual forms were also relatively easily categorised, understood and named. This 
may be due to their familiarity - they are ubiquitous, ‘everyday’ ERs. Domain- 
specific ERs for programming include lists, node and arrow diagrams, tables, 
textual/linguistic representations, trees and notations and formulae. Set dia- 
grams, networks and trees were particularly poorly understood with very high 
rates of misclassification and mis-naming (Figure 2). This was surprising given 
that the participants were computer science undergraduates, however normative 
data based on wider sampling may be required in order to properly interpret 
the significance of this result. In general, the patterns of performance across 
the ER tasks were found to be consistent with those described for the cognitive 
processing of visual objects [2]. Further analyses of the relationships between 
performance on the ER tasks and program comprehension and debugging are 
planned. 
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Abstract. This paper explores the potential of graphical communication for 
cross-linguistic/cultural interaction. Results demonstrate that interactive 
graphical communication provides a useful cross-cultural communication tool. 
However, communicative success is limited by infonnation type, such that 
concepts that do not share a comparable visual form across cultures are less 
successfully communicated. 



1 Introduction 

Graphical communication is typically thought of as a one-way communication 
process, e.g. reading a road sign. Flowever, it is often an interactive two-way process, 
e.g. architect-architect design interactions [1], Graphical communication is shown to 
be of particular utility when language skills are compromised e.g. adults with aphasia 
[2] or during cross-linguistic interaction [3]. In such cases the iconic nature of 
graphics provides an advantage over the more conventionalized, symbolic form of 
language. 

In this paper we explore the power of cross-linguistic/cultural graphical 
communication, examining the communicative success of interacting partners drawn 
from different linguistic and cultural communities (Japanese and Western cultures). 



2 Method 

A graphical referential communication task [4] was used to study cross 
linguistic/cultural graphical communication. Pairs interacted graphically to identify a 
set of predefined concepts. The Drawer depicted 12 concepts from an ordered list 
(12 targets plus 4 distracters) such that their partner, the Matcher, could identify each 
concept from their unordered list. Like the game ‘Pictionary', partners were not 
permitted to talk or use text in their drawings. 
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Partners played 6 games, using the same set of items (presented in a random order) 
in each game. Drawing and matching roles were alternated from game to game. Both 
participants could draw at any time. Graphical communication was carried out 
remotely, partners seated in adjoining rooms communicating via an electronic 
whiteboard with graphics tablet and stylus. The whiteboard tool recorded drawing 
activity and communicative success [5]. However, subjects were not informed of their 
success. 

2.1 Subjects 

76 subjects participated in the study for payment. 21 subject pairs participated in the 
Same culture condition (Japanese- Japanese) and 17 pairs in the Mixed culture 
condition (Japanese-Western). 

2.2 Materials 

In collaboration with Japanese colleagues, a 16-item concept list was constructed. 
Half the items constitute Global items, those sharing both meaning and 
representational form across cultures (i.e. Arnold Schwarzenegger, Clint Eastwood, 
Robert DeNiro, Television, Computer Monitor, Museum, Art Gallery, House). The 
other half constitutes Local items, items that share a comparable meaning but have a 
different representational form across cultures (i.e. Breakfast, Drama, Restaurant, 
Actor, University, Monument, Parliament, Cartoon). 

3 Results and Discussion 

Identification rates (proportion of items correctly identified by matchers) were 
calculated for each pair. As can be seen in Fig. la, identification rates in both the 
same culture and mixed culture conditions improved across games. Note however 
that this improvement is more marked in the same culture condition. 

Proportion scores were entered into a 2 x 6 Analysis of Variance. Analyses were 
conducted by subject (F i) and by item (F 2 ). By subject tests used a mixed design, 
treating Culture (Same and Mixed) as a between subject factor and Game (1 to 6) as 
within. By item tests treated both factors as within. Analyses returned a main effect 
of Culture F,(l, 36) = 7.8, F 2 (l, 15) = 6.5 and Game F,(5, 36) = 20.7, F 2 (5, 75) = 
1 1.9, but no interaction Fs<1.5 (all results are reliable at p<.05). 

Planned comparisons reveal that at Game 1 Japanese and Mixed culture pairs 
exhibit comparable identification success. However, through interaction (game 2 
onwards) same culture pairs' identification rate improved more dramatically than their 
mixed culture counterparts. Significantly, these effects are reliable only when tested 
by subject. Further analyses, treating global and local items separately, reveal distinct 
item effects. Japanese and Mixed culture pairs' identification rate for global items 
starts from and increases by a comparable rate across games (Fig. lb). Local item 
analyses reveal a different pattern. In this condition, interaction leads to more 
successful identification in the same culture condition (game 2 onwards). Thus, it is 
mixed culture pairs' inability to successfully negotiate local concepts that causes the 
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miscommunication evident in Fig. la. Fig.s 2, 3 and 4 illustrate the different ways 
local and global concepts were represented by mixed and same culture pairs. 




Fig. la, b and c. Mean proportion of items correctly identified by matchers in Same Culture 
(Japanese-Japanese) and Mixed Culture (Japanese-Western) conditions across six games of the 
task. Fig. A combines Global and Local item data. Fig. B contains only Global item data 
whereas Fig. C contains only Local item data 

This study illustrates that graphical communication provides a valuable cross- 
linguistic/cultural tool. Graphical interaction facilitated the successful communication 
of partners drawn from different linguistic and cultural communities. Flowever, the 
relative success of their communication was dependent upon the nature of the 
information conveyed, such that global concepts (i.e. those sharing semantic and 
representational information) were more successfully communicated than local 
concepts (i.e. those sharing only semantic information). Thus, although the iconic 
nature of graphics provided a communicative advantage over the more symbolic form 
of language, differences in representational form led to miscommunication. Put 
another way, differences in partner's ‘visual lexicon' meant they were unable to co- 
ordinate their visual representations which in turn prohibited them from establishing a 
shared semantic representation. 
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Fig. 2. Changing representation of a Global concept (Arnold Schwarzenegger) drawn over six 
games by a mixed culture (Japanese-Western) and same culture (Japanese) pair. In both 
conditions partners aligned upon a comparable representation of Arnold Schwarzenegger, i.e. 
his muscles. Communication was successful in both conditions 
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Fig. 3. Changing representation of a Local concept (Breakfast) drawn over six games by a 
mixed culture (Japanese-Western) and same culture (Japanese) pair. The Japanese pair used a 
culture specific representation of breakfast (rice and fish) whereas the mixed culture pair 
aligned upon a less culturally sensitive representation (a slice of bread). Communication was 
successful in both conditions 




Fig. 4. Changing representation of a Local concept (Cartoon) drawn over six games by a mixed 
culture (Japanese-Western) and same culture (Japanese) pair. The Japanese pair used a culture 
specific representation of cartoon (manga comic style). Members of the mixed culture pair 
used different, culture specific, representations of cartoon. Communication was successful 
only in the same culture condition 
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Abstract. This study examined the representation selection preference 
patterns of participants in a database query task. In the database task, 
participants were provided with a choice of information-equivalent data 
representations and chose one of them to use in answering database 
queries. A range of database tasks were posed to participants - some 
required the identification of unique entities, some required the detec- 
tion of clusters of similar entities, and some involved the qualitative 
comparison of values, etc. Participants were divided, post hoc, into two 
groups on the basis of a pre-experimental task (card sort) designed to 
assess ‘knowledge of external representations’ (KER). Results showed 
that low and high KER groups differed most in terms of representation 
selection on cluster type database query tasks. Participants in the low 
group tended to change from more ‘graphical’ representations such as 
scatterplots to less complex representations (like bar charts or tables) 
from early to late trials. In contrast, high KER participants were able 
to successfully use a wider range of ER types. They also selected more 
‘appropriate’ ERs ( ie . ones that the diagrammatic reasoning literature 
predicts to be well-matched to the task). 



1 Introduction 

Successful use of external representations (ERs) depends upon skillful matching 
of a particular representation with the demands of the task. [1] and [2] provide 
numerous examples of how a good fit between a task’s demands and particu- 
lar representations can facilitate search and read-off of information. [3] provides 
a review of studies that show that tasks involve perceiving relationships in data 
or making associations are best supported by graphs whereas ‘point value’ read- 
off is better facilitated by tabular representations. This paper extends our work 
(reported in [4]) by researching selection accuracy and preference patterns from 
early to late trials (within session effects) from a study of participants’ ability to 
select appropriate data displays for use in answering database query tasks. The 
tasks were based on a database of car information (e.g. fuel efficiency, engine 
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size, C02 emissions). Participants were presented with a range of different task 
types (identify a single entity, spot clusters, compare entities on one or dimen- 
sions, etc) over 25 trials. Each trial consisted of one task type (associate, cluster, 
compare, correlate, distinguish, identify, locate and rank). On each trial, sub- 
jects were asked to choose the particular data display representation they felt 
would be most useful for answering the query. The options were presented as an 
array of display-type icons (table, scatterplot, bar chart, etc). When a choice was 
made, an automated information visualisation engine (AIVE) then instantiated 
the chosen representational form with the data needed to answer the task. Each 
query (task) could potentially be answered with any of the display options of- 
fered, but each task type had an ’optimal’ display type. Subjects then answered 
the query using their chosen visualization. Following a completed response, the 
subject was presented with the next task and the sequence was repeated. The 
following data were recorded: the user’s representation choices; time to read ques- 
tion and select representation (selection); time to answer question using chosen 
representation (answer); and participants’ responses to questions. Further de- 
tails about the experimental procedure are provided in [4] . Prior to the database 
query tasks, participants were administered a card-sort task [5, 6] designed to 
assess their KERs. The tasks involves sorting and labeling a large corpus of ER 
examples. The aim was to study the relationship between subjects prior knowl- 
edge (or ‘repertoire’ of ERs) and their reasoning accuracy and representation 
selection performance on the database query tasks. 



2 Results and Discussion 

Participants were divided into two groups on the basis of a post-hoc median-split 
on ER card-sort cluster scores. This yielded two groups - ‘typical’ card-sorters 
(high KER) and ‘more idiosyncratic’ card-sorters (low KER) [4]. Overall both 
groups improved their response accuracy from early to late trials. The low KER 
group from 64% to 83%; and the high KER group from 75% early to late 84%. 
The 25 database task types were collapsed into 3 groups: 1. tasks requiring 
the precise read-off of values; 2. those involving qualitative comparison and 3. 
cluster tasks (involving associating entities, identifying groups of similar entities, 
etc). To assess selection accuracy, higher scores were assigned where subjects 
assigned representations that the literature predicts are most appropriate for 
the task. These include tables for read-off value tasks; bar charts for qualitative 
comparison; and scatter plots for cluster tasks. Representation selection from 
early to late trials over the different task groups showed that the low group 
tended to change from more to less semantically complex representations. In 
contrast the high KER group used a wider range of ER types in early and late 
trials, and selected more ERs that were predicted by the literature to be ‘good’ 
ER-to-task matches. This effect is particularly noticeable for cluster tasks. Fig. 1 
shows the early and late cluster trial selection behaviour. 

High KER subjects tended to use scatterplots (appropriate) in cluster tasks 
whereas, although low KER subjects often started with these, they later reverted 
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Fig. 1. Frequency with which high/low ER knowledge groups used each type of 
representation in cluster type tasks early and late trials (SE bars shown) 
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Fig. 2. Examples of AIVE bar, pie, plot, rose (sector) graph and table representations 
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to simpler ERs like bar charts and tables - with greater response accuracy than 
the higher group. In contrast the low KER group tended to change to less se- 
mantically complex ERs which some literature ([1, 2, 3]) predicts to be good 
ER-to-task matches. However, our results show that the selection of such ERs 
may not result in as great a performance decrement in those subjects as might be 
expected. Our research also shows that there are differences in how low and high 
KER groups change their selection behaviour over time where difficult tasks are 
presented. The high KER group tends to match ERs to tasks which have been 
predicted from the literature to be ‘good’ matches. In contrast, the low KER 
group tend to change to less semantically complex representations which ARE 
not predicted to be task-appropriate representations. The next phase of this re- 
search will investigate relationships between subjects’ difficulty with particular 
ERs, their skill on the card sort task, and their ER classification and labelling 
performance. 
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1 Introduction 

A potential educational advantage of animated diagrams over static depictions is their 
capacity to provide explicit external representation of temporal changes that occur in 
dynamic subject matter [1]. However, animations are not necessarily better for 
learners [2]. One possible reason is a mismatch between a specific animation's 
presentational characteristics and a particular learner's processing capacity [c.f. 3]. For 
example, if the animation's playing speed is too high, the learner may miss some key 
aspects of the content. User control has been suggested as a possible way to address 
such mismatch problems [4]. The assumption here is that the learner regulates the 
animation's playing regime in ways that present information relevant to the task at 
hand in an appropriate fashion. In the previous example, this would involve reducing 
the animation's speed so that key aspects could be readily extracted. Current 
computer-based animation systems can provide users with extensive control over 
speed and various other characteristics of animations. Unfortunately, the provision of 
user control does not always result in the desired learning improvements [5]. 

The limited effectiveness of user-controllable animations may stem from the 
inability of learners to take proper advantage of the control provided. This seems 
especially likely when animations that depict complex subject matter are used by 
learners who are relative novices in the depicted content domain. Learners must be 
able to exercise user control in ways that allow them to locate, extract, and then 
meaningfully integrate thematically relevant information as the basis for building an 
appropriate dynamic mental model of the referent situation. Interrogation of the 
animation needs to be highly strategic, with well-targeted spatial and temporal search 
of the presented information. However, domain novices lack the background 
knowledge necessary for such targeting so their search may be relatively ineffective. 
This poster reports a fine-grained investigation of how novices in the domain of 
meteorology interrogated an animated weather map during a learning task. The 
investigation's purpose was to explore the nature of this interrogation in order to 
better understand why user control is sometimes not as effective as expected. In order 
to investigate what exploration strategies learners would choose when given 
maximum flexibility to interrogate the animation, the player allowed users to control 
the presentation in a wide variety of ways (speed, direction, continuity) via a set of 
video-like controls. 
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Outer 




Fig. 1. Example interrogation plot showing initial surveys of animation followed by drawing of 
individual weather map features 



2 Method 

Ten randomly selected undergraduate Teacher Education students participated 
individually in this study with the goal of learning how weather map patterns change 
over time. Participants interrogated a 28-frame user-controllable weather map 
animation to help them build up a drawn prediction of the pattern of meteorological 
markings likely to appear 24 hours after those shown on a given weather map (the 
‘Original'). Interactions with the animation and associated thinking-aloud were 
recorded on video while being synchronously combined with video of the blank map 
upon which the participant was drawing the predicted markings. The resulting 
‘picture-in-picture' composite video was captured on hard disk in real time allowing 
immediate replay for eliciting stimulated recall of the participants' interrogation 
strategies. Interrogation data were used to plot animation frame displayed versus time 
(Figure 1) and analysed with respect to the scope, speed and distribution of 
exploration. Concurrent think-aloud protocols, retrospective accounts from the 
stimulated recall, and participants' gestures were used to aid interpretation of the 
interrogation plots. 

3 Results and Discussion 



Overall Approach 

Participants spent a total of from 6.9 to 32.0 minutes (M= 13.4, SD = 7.2) working 
with the animation. Most of this period was spent with the animation paused rather 
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than being played; the mean active interrogation time was only 4.6 ( SD = 4.2) 
minutes. Interrogation typically began with the participant's activity divided between 
exploration of the animation (it being either played or paused) and examination of the 
Original weather map. Following this, drawing activity commenced and was carried 
out via a series of episodes during each of which the animation was paused on a 
particular frame. These drawing episodes were interspersed with further periods of 
active interrogation and additional non-drawing pauses. 

Analysis of participants' concurrent and retrospective verbalisations (clarified by 
considering their associated gestures) suggests the following general strategy. 
Initially, markings presented in the animation were compared with those depicted in 
the Original to determine which parts of the animation would be likely to assist with 
the prediction task. During this comparison, the user control facility was employed to 
assist with broad spatial and temporal surveys of weather map information by pausing 
or playing the animation respectively. Subsequent drawing of the predicted weather 
pattern was based on a series of more focused feature-by-feature interrogations of the 
animation, each of which was prompted by a decision to investigate how a specific 
feature within the Original map changed over a 24 hour period. 

Scope 

There was considerable variation in the scope of the sweeps that participants made 
through the animation. When all sweeps made by participants were categorised 
according to their scope (short, medium, long, extensive), short sweeps covering 7 or 
fewer frames accounted for 66% of sweeps made, exceeding all other categories 
combined. The longer sweeps covering most or all of the animation's 28 frames 
tended to be concentrated early in the interrogation, while shorter scans were more 
common later among the drawing episodes. Video records suggest that the longer 
sweeps were used to survey all or part of the animation for potentially useful 
information. In contrast, many of the shorter sweeps resulted from participants 
observing what happened to a meteorological feature over a 24 hour period 
(represented by 4 frames of the animation). 

Speed 

Most temporal interrogation of the animation was carried out by playing it slowly 
rather than quickly, with 45% of all sweeps being done step-by-step (i.e. below the 
two frames per second required to give the illusion of continuous change). At the 
opposite extreme, the fastest sweeps tended to be made through the animation in the 
reverse direction, in contrast to the more usual forward playing direction associated 
with slower sweeps. However, forward sweeps predominated and were used for close 
analysis of the daily changes that occurred for individual features. The main function 
of the fast reverse sweeps was to return to parts of the animation that were to receive 
this close analysis. 
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Distribution 

Only one of the ten participants distributed interrogation evenly across the animation's 
28 frames. The remainder tended to concentrate on short sequences of frames 
(typically visiting the same series of 4 to 5 frames repeatedly) while frames 
comprising the rest of the animation were visited relatively infrequently. The 
animation sub-sequences that participants explored most intensively were those 
containing frames bearing close superficial resemblance to features in the Original. 
One of these sub-sequences is at the beginning of the animation and the other is at the 
end. However, more perceptually-subtle dynamic information of equal or greater 
thematic relevance in the middle section of the animation was largely neglected. For 
example, subjects ignored the gentle undulations of adjacent isobars changing in a 
coordinated fashion, key behaviour typical of disturbances traveling through a fluid 
(in this case, the air). 

4 Conclusion 

A distinctive feature of the way participants employed user control for this task was 
their relative neglect of the animation's dynamic aspects. For most of the time, the 
animation was either stopped on a frame or was being viewed stepwise (a frame at a 
time). Exploration was largely driven by a narrow search for superficial resemblances 
between localised aspects of particular animation frames and the learning task 
material (i.e. the Original map). Rather than running the animation in a variety of 
ways to study the dynamic behaviour of meteorological patterns, participants 
concentrated on the static characteristics of isolated meteorological features. The 
general approach was one of surveying the two representations to match the states of 
individual features, stepping the animation from the ‘before' to the ‘after' state (24 
hours later), then copying the chosen feature's final appearance onto the blank map. A 
consequence of this ‘state-focused' approach was that valuable information about how 
features change in concert was passed over. The resulting predictions were a 
disjointed patchwork of individual fragments. Much key information such as the 
coordinated fluid-like behaviour of adjacent isbars was neither located nor extracted. 
The fragmentary feature-level information obtained resisted integration into a 
meaningful whole. It is likely that the interrogation strategies adopted by participants 
in this study were strongly influenced by the specific learning task employed. Rather 
than focusing their strategies upon learning about how meteorological markings 
change over time, they responded to the highly demanding task environment 
primarily in ways that made it more tractable. This meant they missed key 
information about the tightly integrated temporal relations that prevail among the 
various meteorological features comprising a weather map despite superficial changes 
in feature appearance. 
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Abstract. Previous research suggests that the properties of animation can affect 
the inferences that learners make. Coherent integration of information is critical 
for comprehensive understanding of animation. This paper explores how 
learners link and integrate information from animated diagrams. Results suggest 
that attention focus is crucial to this process, and that animation design affects 
the salience of dynamic information links. 



1 Introduction 

Previous research shows that animated diagrams provide both benefits and 
disadvantages for learning. Although animation depicts movement more visually 
explicitly than static representations, the fundamental form of animation brings 
cognitive complexities for learners. Visual explicitness increases perceptually 
available information, consequently increasing memory and cognitive load. The 
transient nature of the representation prevents reaccess of information [7] and the 
large (unmanageable) amount of salient perceptual information displayed impedes 
focus of attention [2]. In addition, the type of graphical change significantly affects 
inferences made [3], and affects interpretation of the events and concepts being 
displayed [4]. However, for full understanding of a system, learners must also 
understand the relationship between the graphical changes, and integrate this into 
coherent patterns of movement. 

Successful integration of information requires making coherent and relevant links 
between the various types of information on the animation. Integrating information 
from animated diagrams may be problematic due to the basic form of animation, 
which renders information into a series of multiple representations, requiring learners 
to remember previous transitory information (e.g. object location, relational 
components) [7] and increase the amount of visual search needed. Parsing research 
suggests the importance of breaking information into smaller understandable 
components [6]. Kaiser et al. [2] showed that animation facilitated accurate 
observation where only one dimension of dynamic information was present, 
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suggesting that the number of simultaneous dynamic events is also cognitively 
influential. Furthermore, integration of information is facilitated according to how 
information is organized [1]. Parsing a diagram inherently guides and focuses 
attention. Together with sequencing information, this relieves learners from deciding 
which aspects are important and in which order to 'read' information. This may reduce 
confusion and facilitate information integration. Decomposing the animation for the 
learner removes one step of the cognitive load required for understanding the 
animation. This research explores how learners integrate information from an 
animated diagram by providing diagrams with smaller pieces of information from 
which links could be made. Learners were also tested on whether or not these links 
were achieved and how this contributed to their understanding. Two studies 
investigated (i) the relational links that learners may or may not make, and (ii) the 
effectiveness of differently parsed animations on learners' ability to link and integrate 
information (also see [5]). 



2 Studies and Results 

A heart animation was modified to display six sections of a diagram in a visually 
explicit way by semi-transparent masking of the animation. When a masked area is 
uncovered, the revealed information is clearly evident. The animation was modified in 
two different ways: “additive' (fig. 1), where an additional component of information 
is exposed at the same time as the previous, until the whole diagram is completely 
visible; and ‘substitutive' (fig. 2), where a currently displayed section becomes 
masked again when a new component is displayed. Thus, only one section is 
displayed at a time. After all sections have been displayed the full animation can be 
viewed. 

Sixteen pupils from mixed ability science classes took part in a qualitative study 
generating descriptions of the cognitive process of information integration. The task 
was to understand bloodflow pathways through the heart. Data was obtained through 
verbal protocols from pairs of pupils working with different animations. Pupils were 
encouraged to verbalise what they saw and understood from each section of the 
animation, the links between sections, and describe the blood flow process. Learning 
was also assessed using a written test immediately after the interview. Sixty different 
pupils took part in an experimental study; 12 in each experimental condition, (additive 
animation; substitute animation; control animation) and 24 in the test control group 
(12 for each half of the test), who completed all pre and post test papers, but did not 
receive any experimental manipulation. Pupils worked individually in this study. Data 
was obtained through written based assessment and a computer presented diagram test 
completed immediately after working with the animation. Pre and post-tests were 
given to pupils in both studies using a split half technique, the pre-test being 
completed on an individual basis three weeks prior to data collection. 
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Fig. 1. Example frame, additive animation Fig. 2. Example frame substitute animation 

Study 1 showed significant differences in the number of referential links made 
between sections of information according to permutation of animation. Six links 
were identified as important for understanding the flow of blood through the heart. 
With the additive animation a total of 28 out of a possible 48 links were made, in 
contrast to 8 out of a possible 48 with the substitute animation (see [5]). A one way 
independent ANOVA, showed significant differences in test scores according to 
animation permutation (p — 0.02). Post hoc tests showed that the additive group 
perfonned significantly better than the substitute group (p — 0.02), but not the control 
group. A mixed ANOVA showed significant differences in performance between pre 
and post-tests (p — 0.00). A significant interaction (p — 0.03 ), suggests that those in the 
control and additive conditions scored significantly higher on their post than pre-tests, 
whereas those in the substitute condition did not (see fig. 3). 




animation design 



Fig. 3. Graph showing pre and post-test score according to animation 
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3 Discussion 

In study 1 accuracies and omissions of information from the parsed animations were 
similar across the two animations. Despite the increasing amount of the animation 
information on the additive animation, learners still maintained their attention focus. 
This suggests that the amount of perceptually available information is problematic for 
comprehension, due to resulting reduction in focus of attention. Focusing attention 
inherently reduces the amount of perceptual information being processed, and in a 
cyclical animation makes reaccess to specific parts of the dynamics less demanding. 
This study suggests that the additive animation enabled notably more links between 
parsed sections to be made, than the substitute animation. 

Study 2 confirms that understanding blood flow pathways and making appropriate 
information links was more problematic with the substitute animation than with the 
additive or the control animation. Several factors could contribute to this result; (i) the 
substitute animation resulted in clearly segregated sections, unlike the additive 
animation, where parsed sections remain on the screen, making links more explicit 
from one section to the next. This supports learners in integrating information by 
explicitly linking consecutive sections of information; (ii) the substitute animation 
inherently results in a multiple representational format, thus requiring learners to 
integrate across representations. Animation may show dynamic changes more 
explicitly but without specific reference to linked aspects this makes integration 
difficult and adds to the memory load needed to remember information across 
representations. The ‘masking' design should remove problems associated with 
multiple representations, as the whole animation is simultaneously perceptible. 
However, directing attention to specific sections of the diagram through parsing may 
hinder attention to the masked areas. 

In summary these results suggest that if attention focus is facilitated and guided 
then the consequences of the amount of perceptually available information may be 
counterbalanced in terms of detrimental effects on learning. Although parsing the 
animation into smaller components may be beneficial in reintegrating information, it 
is not the only important aspect in facilitating integration of information. Relevant 
links between these pieces of information also need to be made, and it is evident that 
to facilitate integration these links need to be made explicit within the representation. 
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Extended Abstract 

What is expertise? In the context of recent growing interest in the significant role of 
external representations in reasoning and learning, this traditional question may need to 
be reconsidered. In artificial intelligence and cognitive science, expert systems that 
flourished in the early 80's were a bold trial to capture expertise in order to implement it 
in computers. But the inherent tacitness of expertise made it elusive to elicitation. 
According to the situated-cognition view (e.g. [1]), knowledge or expertise is not an 
entity describable independently of the situation surrounding an expert; it is not 
storable and retrievable as an entity in a computer unlike a physical object is stored in 
and retrieved from a drawer. 

A better understanding of expertise may come from investigating how the cognitive 
processes of people change as they become experts. Expertise embodies the kinds of 
features and relations an expert perceives in an external representation at hand and how 
the expert interprets them. The ways experts perceive and conceive are significant 
components of expertise. This implies that experts are able to differentiate and perceive 
some features and relations in the external representation that would be meaningless to 
novices. Therefore, acquisition of expertise can be regarded as a process of becoming 
able to perceive what was not evident previously and conceive what was impossible to 
think of. Gibson and Gibson [2] described a similar process in relation to expertise, in 
their case, wine -tasting: “Perceptual learning, then, consists of responding to variables 
of physical stimulation not previously responded to” (p.34). To become an expert, one 
needs to be perceptually reactive to even the most subtle features and relations and 
conceptually productive of meaningful interpretations. 

What, then, does it take to become able to do so? How could people be cognitively 
trained to do so? Suwa and Tversky [3] [4] have suggested that meta-cognition of one's 
own perception and conception can be an effective method of training. Anecdotal 
evidence is from the domain of sports. Major League Baseball player “Ichiro”, who 
joined Seattle Mariners from Japan three years ago, meant the following in a TV 
interview: he became ‘the best of pros' when he was able to theorize conceptually how 
he perceived the ball and how his body reacted and hit it. Further he added that it took 
him 4 years to do this. He meant that, before the four-year mental effort, his body was 
producing hits just automatically without his knowing how. This anecdote implies that 
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the notion of automaticity associated with expertise should be reconsidered. A mental 
state that a person does not need to pay attention to what he or she is doing cognitively, 
e.g. not being self-aware of how hands and feet are moving in gear shift in driving, may 
not be the goal state of the process of becoming an expert but a mere intermediate stage. 
Although the person has acquired the most basic skill in driving a car, there may exist 
multiple stages of driving skill that he or she has not yet reached. In order to reach a 
more skillful stage step by step through climbing the ladder of professionalism, he or 
she should make mental effort to break the stage of automatic performance and then be 
self-aware of what he or she is perceiving, conceiving and doing. In other words, 
becoming experts may be an endless process of “hunting” for what needs to be 
perceived and conceived. It has a problem-finding nature. 

For the past few years, Suwa and Tversky [3][4][5] have referred to the skill for 
becoming perceptually reactive and conceptually productive through meta-cognition as 
that of constructive perception. In a series of studies, they have employed a task of 
generating as many interpretations as possible from a single ambiguous drawing as a 
measure of that skill. Without the skill, people would be easily fixated to earlier 
interpretations and therefore unable to see features and relations in the drawing from a 
new perceptual frame of reference. They found that expert designers are able to 
produce more interpretations than design novices. This indicates that expert designers 
are more reactive to features and relations in the external world and more productive of 
meaningful interpretations. The skill of constructive perception might have facilitated 
the process of “hunting” in becoming an expert. 

How, then, could people be trained to be meta-cognitive in order to acquire the skill 
of constructive perception? In the lab education for undergraduate students in Chukyo 
University, I have provided practice of meta-cognition using examples in the real 
world. What kind of examples? Arnheim's claim provides a hint. He argued that visual 
attributes and spatial configurations in architectural spaces do not just bring functions 
into realization but also, more importantly, provide rich psychological effects onto 
viewers [6]. Thus, seeking examples in artistic drawings, photographs and architectural 
and environmental spaces in our daily life, I designed a cognitive training program in 
which each student is encouraged to 

• be self-aware of what elements and relations in the external world he or she attends 
to 

• interpret and explicitly articulate what kinds of emotion, feeling and/or 
interpretation are evoked by that perception and how. 

An important point is that there is no correct answer as to what kinds of interpretations 
should be generated for a given example such as a drawing, photograph, architectural 
space or whatever. Students were encouraged to evoke various emotion of one's own 
and freely associate what they saw with various interpretations. 15 undergraduate 
students participated in this training program for 9 months. Before and after this 
cognitive training, I administered the ambiguous drawing test to the participants in 
order to evaluate if the training program has contributed to improvement of the ability 
of constructive perception. The results are the following. Most participants became 
able to generate more interpretations from an ambiguous drawing after the training; the 
average increase of the number of interpretations was 1.68 times. For comparison, I 
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administered the same test twice to others including expert designers, design students 
and non-design office-workers. The interval between the first and second trial varied 
from five months to one and a half year. They, as the control group, were not given 
instruc tion about meta-cognition. The average increase of the number of interpretations 
for the control group was 1.08 times. The ANOVA test indicates that the increase of the 
number of interpretations for our 15 students is significantly larger than that for the 
control group, F(l,37)=l 1.29 (p<0.01). This indicates that our program worked 
effectively to let people acquire the ability of constructive perception through 
meta-cognition. 

The finding of the present paper has shed light on the pedagogy of fundamentals, i.e. 
the ability to be perceptually sensitive and conceptually productive, to acquire 
knowledge or skills in various domains. 
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Abstract. This paper investigates whether first year programming students can 
he helped to understand program behaviour through the use of Object 
(Instance) diagrams. Students were introduced to this diagramming technique 
as a way of visualising program behaviour and then given questions that tested 
their understanding of object referencing. It appears that drawing their own 
diagrams is a strategy applied by more successful students. Attempts to 
encourage all students to use this technique through scaffolding with partially 
completed diagrams failed however. Weaker students did not appear to find 
that the partially completed diagrams helped their understanding or go on to use 
the technique themselves. 



1 Introduction 

There seems to be a general agreement amongst CS educators that many of our 
students have problems in mastering programming. One of the manifestations of lack 
of understanding of program behaviour, is that many students do not seem to be able 
to trace code. In particular, some students do not seem to be able to themselves 
produce a diagram that demonstrates an understanding of object references, which as 
has been pointed out, is a fundamental concept in learning Object-oriented 
programming [1]. 

In order to understand a program's behaviour it is necessary for the programmer to 
have a model of the computer that will execute it. This ‘notional machine' provides a 
“foundation for understanding the behaviour of running programs” [2]. 

In the introductory programming sequence in Aberystwyth, we demonstrate 
program behaviour in lectures and tutorials mainly by drawing pictures of variables 
and what they reference, as in Figure 1. These diagrams represent a rough UML 
Object (Instance) diagram [3] - essentially providing a snapshot of the objects in a 
system at some point in program execution. They are a diagrammatic representation 
of the ‘notional machine' that can then be mentally animated to observe the changes 
in values as the code executes. Students have had quite a bit of practice in creating 
these diagrams. 

One problem we have observed is that when students might appropriately use this 
technique themselves (eg. debugging their own programs), weaker students fail to do 
so. We have found that these students may be impatient when the instructor resorts to 
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drawing a diagram, then surprised that the approach works. Even in the more 
restricted situation of a test that asks the students to trace out what happens when just 
a few lines of code are executed, as in Figure 1, scratch sheets are often returned 
blank. In a previous investigation we discovered that only 36% of the sheets were 
returned with any kind of ‘working out'. 

2 Background, Questions and Results 

In a previous Diagrams conference, Hegarty and Narayanan [4] outlined a cognitive 
model of understanding dynamic systems that supposes that the viewer: decomposes 
the system into simpler components; constructs a static model by making 
representational connections to prior knowledge and other components; integrates 
information between different representations (e.g. text and diagrams); hypothesizes 
lines of action, and finally constructs a dynamic mental model by mental animation. 

In addition, they have empirically validated the design guideline that people leam 
more from being induced to mentally animate a system before viewing an animation 
than by passively viewing the animation. If we examine Object diagrams we can see 
that the first two steps outlined in the Hegarty/Narayanan model are essentially the 
creation of the basic diagram, and the last three steps involve the actual tracing of the 
program code with reference to that diagram. This provides some justification that 
our approach of encouraging students to produce Object diagrams for their notional 
machine is a reasonable one. We were, however, frustrated that many students did not 
produce such diagrams themselves and we wanted to encourage them to do so. 

We perfonned an experiment that sought to answer three research questions about 
the use of Object diagrams (see below) The students were divided into a control 
group and a group who were given partially completed object diagrams (the 
Experimental group). Although these students were not explicitly told to use the 
diagrams, diagrams were identified as belonging with particular questions. A follow- 
up test was also given. See [5] for a complete report on the experiment. 




Person person 1 = 




new Person(“Fred” 


“Aber”); 


Person person2 = 




new Person(“Bill”, 


“Borth”); 


person2 = person 1; 




person 1 .setAddress 


(“Llan”); 


What is person2’s 


address? 



Fig. 1 . A partially completed Object diagram and related question 
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Table 1: Correlation between Diagram Use and Perfonnance 



Group 


Follow-up Test Average - on 
tracing questions 


Beginners who do not use diagrams (23) 


47% 


Beginners who do use diagrams (45) 


68% 



Table 2: Intervention Test Results 



Group 


T est Average 


Beginners Control (28) 


28% 


Beginners Experimental (40) 


36% 



Is drawing some kind of Object-like diagram correlated with success in 
solving multiple-choice tracing questions? There is an indication that the technique 
is used by higher achieving students. The follow up test did not provide students with 
diagrams but a higher score was obtained by those beginners who drew their own. 
This score came close to statistical significance (p=.07 with one tailed t-test). 

Does providing students with scaffolding in the form of partially completed 
Object diagrams help them correctly answer multiple-choice tracing questions? 

It seemed very unlikely that students would NOT do better if they were given partial 
diagrams than if we simply give them the code with no help whatsoever. We wanted 
to confirm this conjecture blit were very surprised by the results. When we initially 
tested the experimental group against the control group we discovered that Object 
diagrams hardly helped students trace code at all (not significant). When we looked at 
the results in a follow-up test, we saw that students who had been given the diagrams 
on the first test did slightly (but not significantly) worse than the control students. 

In light of this result it was not surprising that the answer to our third research 
question: Do students who have been provided with this scaffolding continue to 
use it in such multiple-choice questions? was ‘no'. Since the technique was not 
‘useful' why would students continue to use it? 

We are still pondering the results of this experiment. We assumed that we were 
providing the students with what seems like a reasonable notional machine, blit we 
have considered that by producing the diagrams ourselves we have removed the first 
two steps of the Hegarty/Narayan cognitive model and thus short-circuited the 
students' creation of their own mental model. Research is still on going. 
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Introduction 

Although it is apparent that people are able to make inferences from graphs, it is 
presently unclear how they do so, even from simple graphs. Current theories of graph 
comprehension are largely silent about the processes by which such inferences are 
made ( e.g., Freedman & Shah, 2002; Pinker, 1990). We propose that people use 
spatial reasoning, in the form of spatial transformations (Trafton, Trickett, & Mintz, 
in press), to answer inferential questions. Spatial transformations are cognitive 
operations that a person performs on internal or external visualizations, such as 
graphs. They occur when people must mentally create or delete something (e.g., a 
line) on the image in order to facilitate problem solving, and may be related to 
hypothetical drawing (Shimojima & Fukaya, 2003). This paper investigates the use of 
spatial transformations when people need to make inferences from graphs. 



Method 

Eight GMU undergraduates participated for course credit. Participants were shown 30 
unlabelled line graphs and asked for the value of the y axis at a given point on the x 
axis. This point on the x axis was indicated by a red arrow in one of three different 
positions, creating three conditions: read-off (arrow beneath line), near (arrow slightly 
beyond line), and far (far beyond end of line) (see Figure 1 for examples). Participants 
were shown 10 of each graph/condition combination, in random order; performance 
was self-paced, with a blank screen appearing between each graph. 

To answer the question in the read-off condition participants had to move their 
eyes in perpendicular fashion from the red arrow to intersect the line, move their eyes 
from that intersection to the y axis, and read off the appropriate value. No spatial 
transformations were required for this task. To answer the question in the near and far 
conditions, we propose that participants would mentally extend the line prior to 
locating its intersection with the perpendicular from the red arrow and completing the 
task. The extension — a mental manipulation — constitutes a spatial transformation. 
Spatial transformation theory predicts that the longer the extension, the longer it 
takes; thus, we predict that participants will be fastest in the read-off (no extension) 
condition, somewhat slower in the near (shorter extension) condition, and slowest in 
the far (longer extension) condition. We also predict that accuracy will decrease with 



A. Blackwell et al. (Eds.): Diagrams 2004, LNAI 2980, pp. 372-375, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 



Spatial Transformations in Graph Comprehension 373 





Fig. 1. Line graph read-off condition (left) and far condition (right). In the near condition (not 
illustrated) the arrow was 15 units to the right of the end of the line, along the x axis. 



increased use of spatial transformations, as people must move further from “anchor 
points” on the graph to obtain needed information. Thus, we predict that participants 
will be most accurate in the read-off condition, somewhat less accurate in the near 
condition, and least accurate in the far condition. 

Results and Discussion 

We measured accuracy as the absolute value of the participant's response minus the 
correct response. A score of 0 thus means the answer was completely accurate; 
increased scores represent a decrease in accuracy. Response times (RT) represent the 
amount of time between graph presentation and entry of the participant's response. 
Responses with an accuracy score of more than 100 were considered outliers and 
excluded from analysis, as were responses whose RT was greater than 3 standard 
deviations above the mean. Outliers constituted less than 5% of the data. 




Read -off Hear far 



Fig. 2a (left). Mean accuracy scores for the read-off, near, and far tasks 

Fig. 2b (right). Mean response times in seconds for the read-off, near, and far tasks 
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As Fig. 2a shows, participants were most accurate on the read-off task, less 
accurate on the near task, and least accurate on the far task, repeated measures 
ANOVA F( 2, 14) = 15.46,/? < .01, linear trend F(l, 7) = 18.98, p < .01. These results 
are consistent with our hypothesis that participants use spatial transformations to 
execute the inference tasks. The read-off task required no spatial transformations, but 
as participants mentally extended the line to find a hypothetical point of intersection 
with the arrow, the point of intersection became increasingly distant from the 
“anchor” of the y axis. Participants were decreasingly accurate as a result. 

It is possible that participants' engaged in a speed-accuracy tradefoff; however, the 
response time data indicate that they became slower as they became less accurate. The 
response time data also point to the use of spatial transformations. Participants were 
fastest on the read-off task, slower in the near task, and slowest on the far task, F( 2, 
14) = 4.93, p < .05, linear trend F( \ , 7) = 6.7, p < .05 (see Figure 2b). The linear trend 
is consistent with the idea that a longer extension takes more time to execute than a 
shorter one. If this is true, it should take a measurable amount of time more for each 
extension. In order to calculate how long each extra extension took, we did a multiple 
linear regression, using the distance along the x axis participants had to extend the 
line. This analysis was significant, r=. 41 ,P< .01. The analysis yielded the following 
formula: Response Time = 8.21 + 1.28(1.2), where 8.21 seconds is the baseline time 
to read information from the graph and 1.28 is the amount of extra time required to 
extend the line each 1 .2 cm distance required. This result supports our hypothesis that 
participants used spatial transformations, by indicating a systematic relationship 
between response time and the distance mentally traveled. As participants had to draw 
longer mental extensions to the graph, their response times systematically increased. 

Finally, participants' self-reports indicate that they used a spatial strategy. After 
participants completed the tasks, we interviewed them about their strategies for 
performing each task. Participants unanimously said something like “I went straight 
up and over” in reporting how they performed the read-off task. For the near and far 
tasks, they all reported some variation on extending the line. Typical responses 
included, “1 imagined where the line would go,” “I estimated how far you think — , the 
angle the line is going to go,” “You have to project the line with your eye and then go 
from the arrow to the middle and over to the y axis.” These comments indicate that 
participants relied on the spatial characteristics of the graph to make inferences. 

In summary, we have found that when people make inferences about simple 
graphs, their accuracy, response time, and self-report data suggest that they use spatial 
reasoning, in the form of spatial transformations. Given that current theories of graph 
comprehension provide no account of mechanisms by which people make such 
inferences, we propose that a comprehensive theory of graph comprehension should 
accommodate spatial reasoning, as indicated by these data. 
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Abstract. Certain domains, such as military activities and weather phenomena, 
are characterized by a large number of individual elements moving in a field, 
either in pursuit of an organized activity in groups at different levels of 
aggregation (military action), or subject to underlying physical forces that 
cluster different elements in different groups with common motion (weather). 
Reasoning about phenomena in such domains is often facilitated by abstracting 
the mass of data into diagrams of motions of groups, and overlaying them on 
diagrams that abstract static features into regions and curves. Constructing such 
diagrams of motion basically calls for clustering at different time instants and 
joining the centers of the clusters to produce lines of motion. However, because 
of incompleteness and noisiness of data, the best that can be done is to produce 
plausible hypotheses. We envision a multi-layered abductive inference 
approach in which hypotheses largely flow upwards from raw data to a diagram 
to be used by a problem solver, but there is also a top-down control that asks 
lower levels to supply alternatives if the original hypotheses are not deemed 
sufficiently coherent. 



1 Introduction 

In different domains, often the locations of a large number of entities at close 
intervals of time are available, but in order to be useful it is often necessary to 
abstract from this mass of information a condensed representation of salient 
information to the decision maker. Specifically, we wish to construct a diagram that 
captures the motions of significant groups of entities. Constructing such diagrams 
involves making hypotheses about groups at various instants and coming up with a 
hierarchical grouping hypothesis that is consistent across time. This diagram is to be 
used by a higher level diagrammatic reasoner for situation understanding, i.e., 
constructing an account of what is happening at higher levels of description. In the 
current paper, we discuss several issues related to the construction of such diagrams, 
with emphasis on the special features of the domain of interest, viz., construction of 
coherent accounts of groups and their motions from data obtained about the locations, 
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sampled at several time instants over hours, of a large number of military units 
engaged in an exercise. We also outline an algorithm that solves some aspects of the 
problem. This part of the paper is in the spirit of an interim report on a large project 

2 Discussion of the Problem 

Different types of diagrammatic abstractions for different inferential needs. A 
military commander looking at a movie - suitably speeded up - of thousands of 
military units, each represented by icons that point to their type, moving over a field 
that is marked with information corresponding to terrain and the location of the forces 
and assets of the other side will often quickly start making hypotheses about the plan, 
along the way constructing on a map of the region a diagram of locations and motions 
of various groups. Different types of diagrammatic abstractions help for different 
problem solving needs. The motions of individual units might be abstracted into 
motions of “blobs” [1]. This abstraction (below left, redrawn from the image 
produced by the algorithm in [1]) is useful when for inferences where the spatial 
distribution of the group in motion is relevant, but it requires a temporal display like a 
movie. The same motions might also be abstracted into curves of motion (below 
center and right, of the same data), helping the problem solver to focus on direction of 
motion alone. In either case, enormous amount of detail about the individual units is 
discarded so that salient phenomena can be attended to. 

Relevant and irrelevant motion. Depending on the specific inference need, a certain 
detailed bend in the curve of motion may be relevant or irrelevant and, for 
perspicuity, the motion line may be straightened. The curve on the right in the picture 
is a further abstraction of the figure in the middle, by abstracting away details in the 
motion. 

Available information is incomplete and noise-laden, but requirements of 
consistency can be exploited to overcome these problems. At different time instants, 
information about different units might be missing, and even when available, is prone 
to be noisy. Fortunately, we are not interested in the behavior of the individual units, 
and a robust summary of group behavior can still be constructed even in the face of 
noise and incompleteness. We can exploit the fact that phenomena have a lot of 
consistency - physical constraints make it impossible for radical changes in 
membership of groups at neighboring time instants. 




Detailed identity information is useful, but not essential. Our experiments indicate 
that the individual identity information is not crucial. This is good because real world 
data often do not contain identities of individuals. 
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Information flow downwards from higher levels of problem solving can be used to 
improve lower level hypotheses . The diagrams are intended to be used for higher level 
problem solving, such as guessing the plans of the sides, either by an automated 
reasoning system, or by a human viewing the diagram on an interface. When such 
problem solving faces inconsistencies, that is a signal for selected lower level 
decisions to be questioned and alternate solutions are sought. Appropriate design of 
these algorithms should be able to make use of such top-down information flow. 

3 Solution Approach 

We model the problem as one of layered abduction [2]. Each layer performs 
abductive inference, i.e., takes data to be explained and produces the hypotheses that 
best explains the data. In layered abduction, explanations from one level become the 
data to be explained at the higher level. In our work, the grouping and motion 
hypotheses are performed at one or at most two layers. Clustering algorithms produce 
a small number of alternative grouping hypotheses for each time instant, and 
hypotheses are chosen at instant such that the combination of hypotheses across time 
instants is consistent. (That means that the overall hypothesis of motion of groups 
may not consist of a sequence of best hypotheses at each instant.) The best grouping 
and/or motion hypothesis is passed on to the problem solver. The problem solver will 
use the diagram to build an account of the maneuvers. If more detail is needed or the 
diagram as supplied does not produce a consistent account, the problem solver 
identifies an alternate hypothesis for the relevant part of the diagram, and the lower 
level algorithm will attempt to see if a plausible group or motion hypothesis is 
possible with that alternative hypotheses. Lower level abductive algorithms have 
been constructed that interact with a stubbed higher level problem solver, and work is 
continuing on the problem solver. 



References 

[1] Emmerman, P.J., Walrath, J.D., Winkler, R.P. Making Complex, Multidimensional 
Battlefield Information Intuitive. 21 s1 Army Science Conference, Norfolk, VA, (1998). 

[2] 2. Josephson, J.R., Josephson, S.J. Abductive Inference: Computation, Philosophy, 
Technology. New York, Cambridge University Press, (1994). 



Bar-Gain Boxes: 

An Informative Illustration of the Pairing Problem 



Kevin Bums 



The MITRE Corporation 

202 Burlington Road, Bedford, MA 01730-1420, USA 
kburns@mitre . org 



Abstract. A practical problem in command and control is to assign assets (e.g., 
bomber planes) to targets (e.g., hostile sites), one-on-one, in order to optimize 
an overall operation. The asset-target pairing must be completed quickly (before 
targets act), and the expected effectiveness of an asset against a target depends 
on a number of factual and judgmental factors. Here I present a diagram called 
“Bar-Gain Boxes” designed to help people solve the problem. The diagram uses 
a matrix of boxes to illustrate the possible pairings, along with color-coded bars 
(in each box) to illustrate the gain associated with each individual asset-target 
pair. The diagram is informative because it displays algorithmic results and 
underlying reasons, for normative and alternative solutions. 



1 Pairing Problem 

A current challenge in command and control (e.g., Gulf Wars) involves assigning 
assets (e.g., attack aircraft) to time-critical targets (e.g., enemy entrenchments), 
typically one asset to each target. Given N assets and N targets, there are N! possible 
solutions, and with time-critical targets an effective (optimal) solution must be found 
quickly (before targets act). 

The optimal solution can be computed by numerical methods, such as exhaustive 
enumeration (for problems with small N) or an auction algorithm [1], once a value 
function and its input data are specified. The value function gives the expected utility 
(gain) of an individual asset against an individual target. The problem is that, in 
practical applications, the value function is often incomplete and its input data are 
often uncertain. Thus, the final decision (on a set of N asset-target pairs) is made by a 
human being with help from a support system. 

A typical support system provides a table display that lists the N targets along with 
the one asset (for each target) and associated value assigned by the normative 
(optimal) solution. The human's dilemma is that the normative solution is only as 
good as the system's value function, yet the system does not show how the value 
function affects either the normative solution or alternative solutions (which may be 
more optimal to the human, whose expertise may not be captured by the system's 
value function). Here I present a diagram that helps by displaying algorithmic results 
and underlying reasons, for normative and alternative solutions. 
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2 Bar-Gain Boxes 

In a typical pairing problem, the value function that gives the gain (G) of pairing an 
asset to a target has three terms: G = (U t *Pt) - (U 0 *Po) - (U A *P A ). U t is the utility of 
an emergent, time-critical target (T) to which an asset (A) can be paired. P T is the 
probability that A will be effective against T. Thus, Ut*Pt is the expected utility 
(score) of A against T. Similarly, U 0 *Po is the expected utility (cost) of diverting A 
from its original, scheduled mission. Finally, U A *P A is the expected utility (risk) of 
losing A, from threats by or near T. 

In words, the value function can be expressed as follows: Pairing Gain (G) = 
Target Score (U T *P T ) - Divert Cost (U 0 *Po) - Asset Risk (U A *P A ). Using the standard 
military convention of “red” to denote enemy (Target Score) and “blue” to denote 
friendly (Asset Risk), and using “black” to denote the Pairing Gain and “yellow” to 
denote the Divert Cost, the value function can be expressed in colors as follows: 

black (Pairing Gain) = red (Target Score) - yellow (Divert Cost) - blue (Asset Risk) 

“Bar-Gain Boxes” (Fig. 1) is a color-coded diagram that illustrates the structure of 
the value function and pairing solutions. Fig. la shows the legend. Fig. lb shows a 
matrix where each box represents a possible asset-target pair and the bold frames 
highlight the optimal solution. The small numbers in the boxes provide the ranking of 
multiple solutions (actually 3!=6 solutions are possible), i.e., the three boxes labeled 
“1” are the optimal solution, “2” are the second best solution and “3” are the third best 
solution. In this way, users (e.g., targeteers in a command center) can see both the 
results (black bars) and the reasons (colored bars) for each asset-target pair, for both 
the normative (rank 1) and alternative (rank 2, 3, etc.) solutions, before they make a 
final decision. 

3 Solution Summary 

The display design also shows a “Solution Summary” (Fig. lc) that compares the 
normative solution (rank 1) to alternative solutions (ranks 2 and 3). In this case, the 
user can see (Fig. lc) that three solutions actually have equivalent Pairing Gains 
(black bars); hence the user may select solution 3 instead of solution 1 if his 
preference (not captured by the value function) is to minimize Divert Cost (yellow 
bar). Similarly, even if the user selects solution 1, he may prefer not to divert A (from 
its original mission to (3) since the gain (black bar) for the pair {A, [3} in Fig. lb is so 
small. 
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I Target Score 
I I Divert Cost 
(a) □ Asset Risk 

IM Pairing Gain 





Fig. 1 . An informative illustration of the pairing problem, (a) Color coding: Legend shows the 
terms in the asset-target value function, (b) Bar-Gain Boxes: Matrix of boxes with bars showing 
gain (black) and other terns (colors) in the value function for each asset-target pair. Assets 
(columns) are denoted A, B, C. Targets (rows) are denoted a, (3, y. Small numbers at the bottom 
of boxes denote solution rankings, each solution comprising N boxes in the NxN matrix, (c) 
Solution Summary: Comparison of possible solutions (ranked 1, 2, 3 by the support system) 
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Abstract. The need for Bayesian inference arises in military intelligence, 
medical diagnosis and many other practical applications. The problem is that 
human inferences are generally conservative by Bayesian standards, i.e., people 
fail to extract all the certainty they should from the data they are given. Here I 
present a diagram called “Bayesian Boxes” designed to correct conservatism. 
The diagram uses colored lines and boxes to illustrate the Bayesian posterior 
and the underlying principle. Compared to other diagrams, Bayesian Boxes is 
novel in illustrating the conceptual features (e.g., hypotheses and evidence) and 
computational structure (e.g., products and ratio) of Bayesian inference. 



1 Cognitive Conservatism 

Assume that you hold a Poker hand of 4 Kings and 1 Queen. Blindfolded, you 
randomly select one card from this hand. I then roll a fair die. If the die shows a 
number between 1 and 4, I will tell the truth; if the die shows 5 or 6, I will tell a lie. 
After rolling the die, 1 look at your card and say “King”. Based on the number of 
Kings and Queen, the chances are 4/5=80% that your selected card is a King. Based 
on what I say after rolling the die, the chances are 4/6=67% that your selected card is 
a King. What do you think are the chances that your selected card is a King? 

In laboratory experiments [1] on this question (and other similar problems), most 
subjects report a probability <80%, and many report <67%. Yet, the Bayesian 
posterior is 89%. From a practical perspective, this “conservatism” [2] is of special 
concern in high stakes and time-critical domains (military, medical, etc.) where it is 
important to extract all of the certainty available in the data. Flere I present a support 
system designed to correct conservatism with a diagram called “Bayesian Boxes” [1]. 

2 Bayesian Boxes 

Using “K” to denote the hypothesis (i.e., that the card is a King) and “k” to denote the 
evidence (i.e., that I say the card is a King), the problem is to compute P(K|k). Using 
to denote negation, Bayes Rule can be written as follows: 

P(K|k) = P(k,K) / [P(k,K) + P(k,~K)] = (r 

P(K)*P(k|K) / [P(K)*P(k|K) + ( 1 -P(K))*( 1 -P(k|K))] . 
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Here, P(~K) is equal to (l-P(K)) because K and ~K are mutually exclusive and 
exhaustive hypotheses, and P(k|~K) is equal to (l-P(k|K)) because the problem states 
that I will tell either the “truth” (k for K; ~k for ~K) or a “lie” (k for ~K; ~k for K). To 
further simplify, Eq. (1) is rewritten as follows using the notation P = Prior = P(K), 
L = Likelihood = P(k|K) and B = Bayesian posterior = P(K|k): 

B = P*L / [P*L + (1-P)*(1-L)] . (2) 

Bayesian Boxes (Fig. 1) is a graphic display of Eq. (2), where each algebraic 
product (joint probability) is represented as a colored box with sides (lines) 
representing P and L. 

Fig. 1 is a screen shot of a colored calculator that implements Bayesian Boxes. To 
solve the Card Quiz, a user simply moves the small black hashes to input the Prior P 
and Likelihood L. The Bayesian posterior B is automatically computed and displayed 
as output at the top of the diagram, in both digital (numerical) and analog (colored 
line) terms. The Bayesian principle, Eq. (2), is graphically depicted by the relative 
sizes of the colored boxes, i.e., the posterior line lengths (Red:Blue) are given by the 
ratio of box areas (Red:Blue). In Fig. 1, the ratio of box areas is 8:1 (Red:Blue), and 
these are the posterior odds for B given P=4:l and L=2:l. 



3 Other Options 

A Venn Diagram (see Fig. 2a) uses overlapping areas to illustrate the conceptual 
property of joint occurrence, like “k and K”, denoted (k,K). However, it does not 
illustrate the numerical probabilities, like P(k,K), which are needed for Bayesian 
inference (see Eq. (1)), nor does it identify which joint occurrences are needed to 
compute the posterior or how this posterior should be computed. A Beam Diagram [3] 
(see Fig. 2b) is more help fid because it represents numerical probabilities by relative 
length (in the x direction), and because it illustrates how to compute the posterior via 
two beam-cutting steps labeled (1) and (2) in Fig. 2b. A Space Diagram [4] (see Fig. 
2c) is similar except that it represents each step in the beam-cutting process with a 
different horizontal line (at different y coordinates). 

Bayesian Boxes is novel because it uses both dimensions (x and y) to represent 
numerical probabilities, and because it uses a graphical distinction (between 
horizontal and vertical) to represent a conceptual distinction that is critical to 
Bayesian inference. That is, horizontal lines represent probabilities of hypotheses , 
e.g., P(K) and P(K|k), while vertical lines represent probabilities of evidence , e.g., 
P(k|K) and P(k|~K). Similarly, 2-D boxes represent joint probabilities , e.g., P(k,K) 
and P(k,~K), while 1-D lines represent individual probabilities , e.g., P(K), P(k|K) and 
P(K|k). Bayesian Boxes is also novel in its use of color to highlight the joint 
probabilities (boxes) used to compute the posterior result, i.e., the joint probabilities 
that are not part of the Bayesian calculation are left as uncolored (white) boxes. 
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Bavesian posterior B = 89% : 11% 

I 




Prior P = 80% : 20% 



Legend 

Posterior Red: is P(K|k):P(~K|k) 

Likelihood Red: Mue is P(k|K):P(k|~K) 
Prior Red: Blue is P(K):P(~K) 



Fig. 1 . Bayesian Boxes, as applied to the Card Quiz. Red denotes “K” and Blue denotes “~K” 




Another advantage of color is that it enhances the perceptual grouping of different 
elements (lines and boxes) related to the same hypothesis. This is useful for scaling 
the diagram to more complex problems, e.g., more colors can be used to represent 
more hypotheses, and sequential updates can be represented with a series of diagrams 
(same colors) where each posterior becomes the prior for the next Bayesian update. 
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Abstract. STARK-Roster is a novel diagrammatic interface for a complex 
rostering problem. It may overcome the problems caused by the inevitable 
visual complexity of interfaces for such a complex domain. It encodes the 
inherent conceptual structure of the personnel scheduling task using a globally 
coherent and transparent representational scheme. This further demonstrates the 
utility of the Representational Epistemological Interface Design approach. 



Introduction - REEP Interface Design 

How should interfaces for knowledge rich and information intensive problems be 
designed? Many methodologies and approaches are suggested in the fields of HCI, 
Cognitive Engineering, I nformation and Scientific Visualisation. They typically 
analyze the nature of the task to be supported, or focus on the types of information in 
the domain. Representational epistemological (REEP) interface design is an 
alternative approach, which advocates analyzing the inherent structure of the 
knowledge underpinning the target domain and encoding the identified conceptual 
dimensions in representational schemes that make the conceptual structure transparent 
[2]. Designing interfaces at the knowledge level, rather than at the task or 
informational level, gives solutions that support multiple tasks and makes 
understanding, solving problems and learning about domains easier. REEP interfaces 
can transform the nature of problem solving in a domain [3], REEP principles have 
been proposed [1] and can be used as design guidelines for effective representational 
system to capture the conceptual dimensions of complex knowledge rich domains. 
Some of these principles contend that different conceptual dimensions (degree of 
abstraction, alternative perspectives) should be integrated and that a coherent 
overarching interpretive scheme should be provided to make global conceptual 
equivalences apparent whilst also revealing local conceptual distinctions. 

Novel representations have been designed and evaluated in educational domains 
(mathematics and physics) and for real world scheduling problems (university 
examination timetabling) [1,3]. Here a further example is presented of how REEP 
interface design can guide the invention of a novel representation that may effectively 
support problem solving in a demanding personnel scheduling domain, specifically 
nurse rostering. Of particular interest is how the complexity of the domain seems to 
require interface designs with multiple windows/views for all the different forms 
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information. A single window interface will be visually complex and usually 
considered to be a poor design. However, using the REEP approach a new interface 
has been designed that appears to use visual complexity to its advantage. 



Novel Nurse Rostering Representation - STARK-Roster 



Nurse rostering is an especially demanding personnel scheduling problem because of 
the variety of constraints that exists. These include: (a) staff-requirements for each 
shift, specified in terms of the number and type of nurses, in turn based on attributes 
such as rank, skills and qualifications; (b) total working hours requirements in a given 
period (e.g., week); (c) preferences for shifts that a nurse wishes to work or to be kept 
free for leave; (d) prohibitions on working consecutive shifts. Conceptually these 
requirements are quite different, some being grounded in set theoretic notions and 
others in arithmetic relations. Information about individual nurses and actual 
assignments must also be encoded. Hence, creating an effective interface for this 
domain is a stringent test for any approach to interface design. 

Figure 1 shows the novel STARK-Roster 1 interface. According to one REEP 
guideline distinct conceptual differences should be made apparent. Hence, main 
columns represent days, sub-columns are shifts in a day. Rows are particular nurses. 
The subdivisions within each shift represent different nurse grades and skills. Each 
row consists of a “pipe”. A “plug” in the pipe is an assignment of a nurse to a shift. A 
shift preferece is shown by an “annulus” around the pipe: plain (with rounded 
corners) for an ‘on' shift preference and with cross for an ‘off shift preference. 




Fig. 1 . STARK-Roster Interface. A one week by seven nurse portion of the full roster is shown 



1 Semantically Transparent Approach to Representing Knowledge 
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According another REEP guideline, all these conceptual dimensions should be 
integrated within a coherent overarching interpretive scheme. What is common to all 
of the conceptual dimensions - the critical insight - is the notion of the degree to 
which requirements are satisfied. Hence, a simple representational scheme with 
colours is used: grey - in the desired range; white - insufficient; black - excess. A 
black annulus with a cross means an assignment has been made for a shift that a nurse 
preferred not to work. A plain annulus in grey means a nurse's preference to be 
working is satisfied, whereas a white one means a failure of the requirement. A grey 
pipe indicates the total time for a nurse is within limits, but under or over allocation is 
shown by a white or black pipe, respectively. The satisfaction of nurse grade/skill 
requirements is shown by the colour of the thin bars in each shift column. 

STARK-Roster is obviously has high visual complexity. However, this may not be 
a deficiency because the overarching interpretive scheme makes higher order relations 
readily apparent as emergent features. Regions that have many white objects, or 
mostly black objects, such as those indicated by the circles to the left and right of Fig. 
1., are problem areas. A preponderance of white means multiple requirements have 
not been met and a surfeit of black means excessive allocations. These emergent 
properties may, in turn, support effective roster improvement strategies. For example, 
reallocating nurses from black regions to white regions is likely to resolve under and 
over satisfaction of requirements, without changing the overall amount of work 
allocated. The suitability of particular reassignments can be judged by attending to the 
local distribution of white-grey-black objects across the distinct conceptual 
dimensions. 

In conclusion, the design of STARK-Roster is a further demonstration of utility of 
the REEP approach to interface design. Our empirical evaluations of the apparent 
benefits of the STARK-Roster diagram will test the claim that visually complexity 
can be an acceptable feature of an interface provided the interface is underpinned by a 
coherent global knowledge level interpretive scheme. 
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1 Introduction 

Complex products, such as airplanes or helicopters, can contain thousands of 
different components, connected in different ways designed by a large and disparate 
multidisciplinary team each using their own tools and representations. Each tool, such 
as a CAD system, provides a detailed view of a particular type of information, for 
example geometry. However no current tool provides a high level abstract overview 
of the product, which is required for many design decisions, forcing designers to 
wade through large of amount of detailed data in different representations. Industry 
needs visualization techniques, which allow designers to interact with the data at 
different levels of abstraction, to gain both an overview and a detailed understanding 
of a product. 

This paper will briefly introduce visualization techniques being developed at the 
Engineering Design Centre (EDC) of Cambridge University Engineering Department. 
These techniques support engineers in changing existing products to meet new needs 
and requirements. 

2 Diagrams in Product Change 

2.1 The Design Structure Matrix 

The Design Structure Matrix (DSM) is a standard design research tool for modelling 
dependencies between components, process steps or people; direct linkages or 
connections between two elements of the same type are highlighted, (see Browning 
[2], August et al. [1] and figure 1). A DSM is mathematically equivalent to an 
adjacency matrix for graphs. 

DSMs visualize a product or process at a chosen level of abstraction. However the 
DSM grows with the complexity of the model, making it less effective. It shows 
which parts are directly connected, but does not show indirect connections (e.g. 
Component A is connected to component B, B is connected to C, so A is connected to 
C as well). 

The DSM data can be the basis for further analysis. The Combined Risk Plot (see 
Clarkson et al. [1] and figure 2) has been developed to show the risk of change 
propagating through a product in terms of the impact and the likelihood of a change 
to one component affecting others. The same representation can be used for direct 



A. Blackwell et al. (Eds.): Diagrams 2004, LNAI 2980, pp. 388-391, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 



Visualization Techniques for Product Change and Product Modelling 389 



and indirect risk of change spreading. The width of the rectangle represents the 
likelihood of a change and its height is proportional to the impact. The area of the 
corresponding rectangles represents the risks. Colour coding indicates the severity of 
the risk. Connections with high risks can be highlighted with a different colour (here: 
red). 




Fig. 1. Design Structure Matrix 




Fig. 2. Combined Risk Plot 
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Fig. 3. Link Connection Plot 

2.2 The Link Connection Plot 

The Link Connection Plot is a plot showing all the linkages between different 
components of a product, such as geometric, thermal or electric linkages. The 
components are visualized as nodes of a network, the linkages between them are 
shown as directed arrows (see Jarratt [5] and figure 3). 

Different linkage types are drawn in different colours to distinguish between them. 
It is possible to suppress certain kinds of connections, e.g. if one is just interested in 
the spatial connection between components, other connection types can be hidden. 
Link Connection Plots uses different algorithms based on mathematical graph theory. 
With this diagram it is for example possible to detect indirect connection between 
components, just by following the path from one component to another one. The 
disadvantage of this form of visualization is that it is highly dependent on the layout 
of the network. Different layouts can lead to different interpretations of the structure 
and finding a good layout can be very difficult. 

2.3 The Component Connection Plot 

While the Link Connection Plot shows the entire product, the Component Connection 
Plot focuses just on one component of the product. As with the Link Connection Plot 
it uses a network to visualize the connections. The difference to the Link Connection 
plot is that one component is centred on the screen and only those components are 
visible that have a direct connection to this component (see figure 4). 

This form of displaying connections leads to less crowded diagrams than the 
approach shown in figure 3, but does not show indirect connections between 
components. However it, facilitates the display of large structures, as the parts of the 
network, that the viewer is not interest are hidden in this display. 
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Fig. 4. Component Connection Plot 



3 Outlook 

The diagrams briefly introduced in this paper are examples of visualization 
techniques used in product change and product modelling in engineering design. 
Each of these diagrams has its weaknesses and further research is required to find 
better solutions to deal with even more complex designs. So far the visualization has 
received possitive feedback in informal discussions with future users. We are 
planning are more formal evaluation, for example making use cognitive dimensions 
(Green [6]), and add further visualisation techniques to our tool. 

These approaches can range from adding hierarchy structures to DSMs and 
aggregating closely related components to finding good layout algorithms for the 
Link Connection Plot. All the diagrams introduced above have few interactive 
capabilities. Allowing more interactive features with them could help users to find 
interesting features in the design as well. 
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Abstract. A composite cluster map displays a fuzzy categorisation of 
geographic areas. It combines information from several sources to provide 
a visualisation of the significance of cluster borders. The basic technique 
renders the chance that two neighbouring locations are members of dif- 
ferent clusters as the darkness of the border that is drawn between those 
two locations. Adding noise to the clustering process is one way to obtain 
an estimate about how fixed a border is. We verify the reliability of our 
technique by comparing a composite cluster map with results obtained 
using multi-dimensional scaling. 



Projecting Classifications Geographically 

A large variety of applications (ranging from image segmentation to data mining) 
have made use of clustering techniques [2]. Clusters may be visualised as an 
aid in identifying similar attributes, as well as to identify significant classes of 
individuals, the task we focus on here. Visualisation of geographic information 
is extensively studied by Bertin [1]. 

Iterative clustering produces a hierarchical categorisation that can be repre- 
sented by a DENDROGRAM, i.e. a tree showing the history of the clustering pro- 
cess. Each time two elements are fused, a new node is introduced with branches 
to the fused elements. The length of a branch reflects the COPHENETIC distance, 
the distance between the elements when they fuse. 

Cutting the dendrogram anywhere along a line perpendicular through its 
branches gives you a clean cluster division: each element is stored into one of 
several groups (see Fig. 1). To inspect for geographic influences in the data, we 
project this classification onto a map, making use of standard tiling techniques 
(see Fig. 1). This is useful, but to a limited extent, because the map shows a clear 
division in a number of equal groups (a rather arbitrarily chosen number), that 
may not reflect reality, and at best reflects a small fraction of the information 
in the dendrogram. Fig. 2 notes a second problem with the standard projection, 
namely that there is no reflection of how significant the borders are. 

The composite CLUSTER MAP is obtained by collecting chances that pairs 
of neighbouring elements are part of different clusters. The order in which hier- 
archical clustering proceeds gives one estimate: the later two elements are joined, 
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Fig. 1. A classification of pronunciation via edit distance [3] when subjected to clus- 
tering yields a dendrogram (left). We can project any arbitrary level of the dendrogram 
to create a categorial map (right), which however, loses a great deal of the information 
in the original classification 




Fig. 2. We examine the distinctness of the groups classified in Fig. 1 by applying 
multi-dimensional scaling (MDS): items are located in the plane so that the relative 
distances between them correlate optimally with their mutual differences. The icons 
used in this display correspond to those in the dendrogram and map in Fig. 1. The 
dendrogram in Fig. 1 suggests that the groups ought to be distinguished well, as there 
is a reasonable horizontal distance between the groups and the nodes that subsume 
them. The MDS plot (all, left) demonstrates that the north-south break in the data 
is indeed robust, as are some southern distinctions (2nd) but the details (3rd and 4th 
plots) are less encouraging about the degree to which the data clusters naturally. We 
will look for this structure in the composite cluster map as well 



394 Peter Kleiweg et al. 





Fig. 3. A composite cluster map is obtained by drawing higher levels of clustering as 
borders of increasing darkness (left). Alternatively we repeatedly cluster the same data 
with variable amounts of noise (right). Note that the North-South division prominent 
in the MDS analysis (Fig. 2) emerges clearly 



the larger the chance they belong to different clusters. The cophenetic differences 
provide another estimate. 

Because hierarchical clustering is inherently unstable [2], we add noise to the 
clustering, and combine the results of many clustering runs. In fact, we exploit 
the instability, which is usually considered to be a weakness, to distinguish natu- 
rally sharp borders, shown as dark lines, from transition areas, shown as nets of 
light lines (Fig. 3, right). 

The next steps. . . 

Our composite cluster maps give a more differentiated picture than a simple 
cluster division. The next step will be to inspect this ‘granularity’, comparing 
several clustering algorithms with results of other techniques, such as multi- 
dimensional scaling. A major test will be to have experts in the field of study 
evaluate these maps. 
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1 Decision Diagrams in Credit-Risk Evaluation 

One of the key decisions financial institutions have to make as part of their daily 
operations is to decide whether or not to grant a loan to an applicant. With the 
emergence of large-scale data-storing facilities, huge amounts of data have been 
stored regarding the repayment behavior of past applicants. It is the aim of credit 
scoring to analyze this data and build models that distinguish good from bad 
payers using characteristics such as amount on savings account, marital status, 
purpose of loan, etc. Many classification techniques have been suggested to build 
credit-scoring models. Especially neural networks have in recent years received 
a lot of attention. However, while they are generally able to achieve a high 
predictive accuracy rate, the reasoning behind how they reach their decisions is 
not readily available, which hinders their acceptance by practitioners. Therefore, 
in [1], we have proposed a two-step process to open the neural network black box 
which involves: (1) extracting rules from the network; (2) visualizing this rule 
set using an intuitive graphical representation, such as decision tables or trees. 

Clearly, an important criterion here is the size of the generated representa- 
tion. It has regularly been observed that the decision trees generated by machine- 
learning algorithms turn out to be too large to be comprehensible to human ex- 
perts. Hence, in this paper, we report on the alternative use of decision diagrams 
in this visualization step. More specifically, since we are dealing with general 
discrete (as opposed to binary) attributes, we will apply multi-valued decision 
diagrams (MDDs). An MDD is a rooted, directed acyclic graph, with m sink 
nodes for every possible output value (class). Each internal node v is labelled 
by a test variable (attribute) var(y ) = Xi(i = l,...,n), and has an outgoing 
edge for every possible value that Xi might take. An MDD is ordered (OMDD), 
iff, on all paths through the graph, the test variables respect a given linear or- 
der X\ -< X2 -< ... -< x n . An OMDD is said to be reduced , iff it does not contain 
a node v whose successor nodes are all identical, nor does it hold any isomorphic 
subgraphs. Conceptually, a reduced diagram can be interpreted as the result of 
the repeated application of two types of graph transformations: one is to bypass 
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and delete redundant nodes ( elimination rule), the other is to share isomorphic 
subgraphs ( merging rule). 

Decision diagrams have to a great extent been studied and applied by the 
hardware design community [2], Although their use has occasionally been pro- 
posed in the machine- learning community (e.g. in [3]), they have so far not 
gained wide acceptance in the latter problem context, partly because the theo- 
retical size advantages of subgraph sharing did not always directly materialize 
in (the relatively scarce) reported experiments. 

2 Empirical Evaluation on Real-Life Data 

The experiments were conducted on three real-life credit-risk evaluation data 
sets: German credit (publicly available), and Benel and Bene2 (which were ob- 
tained from two major Benelux financial institutions) . We then investigated the 
performance of two neural network rule extraction algorithms: Neurorule and 
Trepan. It was observed that Neurorule and Trepan consistently produce very 
accurate classifiers when compared to C4.5 and EODG [3], two other algorithms 
producing decision trees and diagrams, respectively. Also, the size of the C4.5- 
tree was in all cases prohibitively large for visualization purposes. 

In a second step, diagrammatic notations, while retaining the predictive accu- 
racy of the rule sets extracted in step one, can provide a more suited visualization 
for validating the knowledge as a whole or applying it to case-by-case decision 
making. Hence, we built an MDD representation from each rule set. To minimize 
their size, an exact variable order optimization procedure was applied. Figure 1 
depicts the MDD thus obtained from the Benel rule set extracted by Neurorule. 
The results for all MDDs are listed in Table 1. In all cases, the diagrams were 
sufficiently concise to be easily understood and applied. To give an idea of the 
amount of subgraph sharing, we have included a column displaying the size of 
the equivalent decision tree obtained when the same (total) attribute ordering is 
adopted. To make the analysis fair, we avoid repetitive counting of sink nodes, 
and measure size in terms of the number of internal nodes. The percentage in 
the final column thus provides an indication of the effectiveness of the merging 
rule. We can conclude that, except for the German credit classifier produced by 
Trepan, substantial size gains are being achieved as a result of MDD reduction. 
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Table 1. MDD size results 



Data set 


Extraction 

method 


Internal nodes 
in min. -size MDD 


Internal nodes 
in matching tree 


Size 

saving 


German 


Neurorule 


7 


14 


50% 




Trepan 


7 


7 


0% 


Benel 


Neurorule 


8 


12 


33.3% 




Trepan 


14 


29 


51.7% 


Bene2 


Neurorule 


11 


28 


60.7% 




Trepan 


16 


51 


68.6% 




Fig. 1. Minimum-size MDD for Benel/Neurorule 



Feature Diagrams in Phonology 



Don Salting 

Department of English, North Dakota State University, 
Fargo, ND, 58105-5075, US 
donald . salting@ndsu . nodak . edu 



Abstract. Phonology is the study of sound patterns in language. These patterns 
give evidence that the segments of speech - consonants and vowels - are 
composed of sub-segmental dimensions called Distinctive Features. Both 
segments and features are ultimately definable in physical terms - either 
articulatory or acoustic. A question that has dominated phonology from its 
generative inception concerns the possibility/degree of abstraction in the mental 
representation of these physical events. This poster examines attempts to 
diagrammatically represent Distinctive Features. Over time, these diagrams 
have become more abstract and simpler - evidence that perhaps physiology is 
informed by diagrams. 



1 Introduction 

Segments - consonants and vowels - are the building blocks of human language. It 
has long been understood that segments are not the primary components, but are in 
fact constellations of sub-segmental specifications called Distinctive Features. 
Distinctive Features can ultimately be described in physical terms of either 
articulation or acoustics. 

Phonologists study segment distribution patterns for clues to the makeup and 
organization of Distinctive Features. Many patterns are explainable as responses to 
physical considerations - either ease of articulation or acoustic salience (ease of 
comprehension). Flowever, there are many patterns and asymmetries within patterns 
that contradict these physiological exigencies. 

In the earliest literature, features were represented in a potentially random ‘bundle'. 
Subsequent research indicated that certain classes of features occasionally act in 
concert to the exclusion of other features. This was seen as evidence that the ‘bundle' 
has an internal organization - a hierarchy. The hierarchy is most easily and clearly 
defined via diagrammatic representation. After demonstrating patterns of evidence for 
features and classes of features, I will offer a chronological survey of diagrammatic 
approaches to feature theory. We will see a movement from the physical to the 
abstract where the diagrams inform the physiology. 

1.1 The Evidence 

A common pattern in language that provides evidence for sub-segmental features is 
assimilation. In assimilation, segment a varies by exhibiting the same specification 
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for Feature 1 as segment [3, thus assimilating to P's specification for Feature 1. A very 
common and universal example can be seen in nasal assimilation. In English, when 
the negative prefix in- is added to a word, the [n] can assimilate to the Place of 
articulation of the following consonant. In English, this pattern is robust enough to 
have appeared in the spelling as well: witness in-tolerant vs. im-perfect. In imperfect, 
the /n/ 1 has assimilated to the bilabial Place of articulation of the following [p] and 
become [m]. Flowever, the /n/ in in- never varies for any other property. Thus, the 
assimilation is not complete, but partial. 

A more sophisticated form of partial assimilation can bee seen in vowel harmony, 
which occurs in many languages from many families. In vowel harmony, vowel a 
(target) will appear with one of the qualities of vowel P (trigger) in a given domain. 
Vowel articulation is definable as a point in a 2-dimensional vowel space delineated 
by a Y-axis of tongue Height and an X-axis of Color. The harmonic feature is one of 
the properties of either the X or the Y axis. The diagrams below will illustrate 
attempts to represent the components of the X and Y axes in a manner that will 
account for natural phenomena and variation. 

2 The Diagrams 

In this section I present a chronological overview of attempts to diagrammatically 
organize Features. We will see an evolution from concrete to abstract in the 
definitions of the features themselves, accompanied by simpler and more 
diagrammatically motivated hierarchies. Fig. 1 shows an early Feature Geometry [1]. 
With the single exception of [sonorant], all the features reference, and are grouped 
according to, specific articulatory gestures 2 . 

Of relevance to this paper are the features that describe vowel Place. In Fig 2, they 
are clustered under the Dorsal node ([low], [high], [back]), under the Tongue Root 
node ([tense]) and under the Labial node ([round]). Such an arrangement is 
problematic on several counts. First, of the three features linked to the Dorsal node, 
two reference Height (y-axis) and one references Color (x-axis). Further, the features 
for Height are spread out over two nodes as are the features for Color. This does not 
allow for a unified description of harmony that references an axis rather than a single 
feature. Second, the hierarchy implies that dorsal and labial consonants could 
influence vowel patterns (extremely rare). 

Subsequent research sought a universal arrangement of these features that would 
capture cross-linguistic phenomena. Many arrangements were proposed, but 
invariably they used the same, articulator-based features. The search proved futile and 
here the field splits. One camp took this as proof that there is no representation of any 
kind, and that language patterns are the product of conditions - constraints - on the 
output. This is the foundation of Optimality Theory, which is currently the dominant 
framework in phonology in the US [2]. 



1 The solidus - Ini - indicates the abstract or underlying form of a segment; the bracket notation 

- [n] - indicates the phonetic output. 

2 This approach is a direct descendant of the features put forth in The Sound Pattern of English, 

Chomsky & Halle (1968). 
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Other linguists considered that the fault lay not with the notion of universal 
representations, but with the assumption that specific articulations are universal 
features. In Fig. 2 are two models for vowel Height features that have allowed for a 
degree of abstraction in the representation. In both cases, the hierarchy delimits the 
vowel Height continuum. That is, the diagram informs the physiology. 

The hierarchy in 2a [3] accounts for languages that exhibit chain-shift vowel 
harmony - a process where degrees of height are points on a continuum. However, 
this model requires internal modification to account for non-chain-shift harmony (the 
dominant variety). The hierarchy in 2b [4] accounts for the more prevalent types of 
height harmony. Further, 2b also accounts for many cases of Color (x-axis) harmony. 
Both models employ varying instantiations of a more abstract feature: [±open]. A 
property that both models in Fig. 2 allow for is that a given segment, though 
physically the same or similar in many languages, can have varying roles in the sound 
system of a given language. 





Fig. 3. Molecular hierarchy [5] 
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While the models in Fig. 2 are a significantly abstract departure from previous 
hierarchies, they still specifically address a physiological dimension - vowel Height, 
and utilize a feature specific to vowel Height. This implies that consonants and vowel 
Color reference different features. 

An even more abstract model is being proposed that suggests only two features: C 
(Consonant) and V (Vowel) [5]. This model suggests a universal hierarchy delimiting 
each of the many facets of speech sound. Each parameter has a Head category - C,V - 
and a Dependent category - c,v. The physiological specifics of the features [C,V,c,v] 
are determined by the dimension they delimit. For vowels, the higher vowels, being 
more closed in their aperture and thus more like consonants, are more C than lower 
vowels, which are more V. The hierarchy for vowel height is given in Fig. 3. 

In this model, the height continuum of vowels would be specified as in Table 1 : 

Table 1. 

Cc Cv Vc Vv 

Highest ^ ^ Lowest 



The ultimate organization of the height continuum in the Molecular model (M) is 
identical to that in the Nested Subregister (NS) model (Fig. 2b). Further, both models 
assume that phonological primes (features) are not phonetically specific. The 
differences are that the NS model diagrammatically divides the vowel space, whereas 
the M model demarks featural associations and accounts for all dimensions of speech 
sounds. 

Evidence for the efficacy of any model comes from its ability to account for real- 
world data without recourse to ad hoc conditions. The models in Figs. 2-3 accomplish 
just that. 
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Abstract. Geographic Information Systems (GIS) are notoriously dif- 
ficult for users unfamiliar with the field, both for query formulation and 
for interpretation of display and results. Gaia is a graphical classification 
system which reduces this difficulty by using a sketch-like classification- 
by-example input interface, a fuzzy match of the classification against 
the GIS database, and the use of hue and saturation overlays to present 
results. This work presents Gaia’s interface and result visualization mech- 
anism. 



1 Introduction 

A Geographical Information System (GIS) provides a means to store, search, 
and display information referenced by its spatial position; as such, it may be 
regarded as the electronic equivalent of a map. Unfortunately, at the present 
time GISs tend to be much more difficult for novices to use than a paper map, 
with the result that the rich information within the GIS is not readily available. 
As observed in [1] numerous learning barriers often cause users to seek assistance 
from more skilled GIS users with the knowledge and background necessary to 
properly utilize the GIS. In [2] Egenhofer notes that text based spatial queries 
are often tedious and many visual interfaces tend to utilize the same syntax and 
grammar as textual interfaces causing them to suffer the same drawbacks. 

In this work, we develop a sketch-like classification-by-example mechanism 
for defining spatial classes in a GIS, and combine it with a fuzzy approach to 
matching the GIS data with a saturation-based display of classification results. 
The approach is primarily useful for locating map regions with attributes which 
are similar to those of other, known locations in the map; for instance, locating 
areas whose elevation is close to that of a given location in the map, or whose 
land-use attributes match the attributes of another, known place. 

The technique considered here is expected to be useful in a number of areas: 
traditional GISs targeting novice users and a mechanism for specifying allowable 
routes for path-planning are currently under consideration. A prototype GIS 
named “Gaia” has been developed as an example of the approach, permitting 
users to search USGS Digital Elevation Models (DEMs) and Digital Line Graphs 
(DLGs). 
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2 Gaia: Classification by Sketch 

Figure 1(a) shows an elevation map with line graph elements of roads (colored 
tan in the on-line version of this paper), interstate highways (brown), streams 
(blue) and power transmission lines (yellow). 

The lower right portion of the screen provides the user with a list of available 
USGS attribute classifications, and allows the user to control which attributes 
are visible. When the user initiates classification only the set of visible attributes 
are used in determining positive examples. 

The user is able to select pixels which should be regarded as examples to 
be matched. In Figure 1(b), we see some road pixels (red) selected. The user 
initiates classification by clicking the ’Classify’ button when they are satisfied 
that the current sketch contains members that are representative of the class 
they wish to create. 

The pixels selected by the user are used as examples of areas to be matched 
based upon a simple fuzzy classification scheme. In the current implementation 
a simple fuzzy set membership function is used that is based on the number 
of attributes present in any of the example pixels which are also present in the 
pixel being considered. All attributes are considered as simple boolean matches, 
except elevation which is matched through a similarity measure based upon the 
maximum and minimum elements in the example set. 

A classification performed upon the selection of figure 1 will look for roads 
whose elevation is close to that of the selected road area. Roads with different 
elevations, and areas with similar elevations but which are not roads, will be 




(a) User Sketching Environment Before Se- (b) User Selection of Road (Red) 
lection 



Fig. 1 . Gaia User Environment 
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Fig. 2. Visualization of classification resulting from sketch in figure 1(b) 



classified as poor matches. Non-road areas with a different elevation will not 
match at all. 

Classification inferences are displayed using a saturation-based mechanism 
similar to the hue and saturation based method described in [3]. In this case, we 
have different constraints on our visualization; we only need to use one dimension 
of the color space to show fuzzy set membership, and it is necessary to show both 
the results of the classification and the underlying map. 

We visualize set membership by showing pixels which are members of the 
set in a user-specified color (the “classification color” in Figure la). Positively 
classified pixels are assigned the hue specified in the classification color, while 
saturation is calculated according to the degree of set membership. The result 
is to smoothly interpolate between the value shown by the underlying elevation 
relief map and a fully-saturated value corresponding to the classification color. 
Brightness is unchanged, leaving the three dimensional illusion provided by the 
relief map. The result for the query shown in Figure 1(b) is shown in Figure 2. In 
this figure, roads which are not in the specified altitude range are a light green, 
as are areas which are in the correct altitude range but are not roads. Roads in 
the specified altitude range appear as a “bright” ( ie saturated) green. 

3 Conclusions and Future Work 

Although Gaia is in the early stages of development several key observations 
can be made. Firstly, sketching is an effective mechanism for some forms of 



Using Color Component Overlays 405 



classification, even with a simple but viable fuzzy mechanism for classifying 
map pixels. 

As seen in figure 2(a), combining color characteristics from the underlying 
map with a user specified output color through interpolation in the HSB color 
model provides an effective display, resulting in a visual output that retains 
the underlying terrain characteristics while providing the user with visual cues 
regarding degree of set membership. Saturation appears to be an effective cue for 
belief in this domain, as well as in the evidential reasoning previously explored. 

Since Gaia is in the early stages of development many opportunities for future 
work exist. One such possibility is the extension of Gaia’s simple classification 
algorithm incorporating spatial knowledge, including the incorporation of pixel 
based computations such as those used in BitPict[4], Alternative classification 
algorithms could potentially include the assignment of belief to topologies, in- 
corporating man existing ideas. 

A weakness in this work is that complex queries are not possible; effectively, 
the only queries available are those defined by fuzzy intersection of subsets of the 
map. To be truly useful, extensions to more complex queries, such as negative 
queries and disjunctive query combinations, are necessary. It seems highly likely 
that concepts from Anderson’s Inter-diagrammatic Reasoning[5] could be used 
in extending this work: a number of maps could be created by various simple 
queries on the original map, and combined with fuzzy boolean operators. In this 
way, it would become possible to specify, for instance, farmland which is not 
used for corn. 

Finally, we anticipate this work should be useful as a component in appli- 
cations such as robot path planning. We are currently investigating such an 
application, in which a user identifies good and poor path elements, assigns cost 
functions to them (a road element will have a much lower cost than a forest), 
and a path planning algorithm selects an optimal path given the cost function. 
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1 Introduction 

Children have problems in understanding and becoming proficient in abstract 
representational systems that convey concepts, such as algebra. We focus here 
on using diagrams to help them learn to translate arithmetic word problems into 
calculator expressions. The problem here is learning how to find a mapping be- 
tween features of the word problems and actions performed on a calculator, and 
how to represent and handle intermediate results, especially when more than 
one operation is involved [2, 5]. We approach it by looking for representations in 
which the relevant information is easy to read; which are easy to edit as required; 
and which encourage reflective abstraction, the means by which students con- 
struct abstract structures by reflecting on their own activities and the arguments 
used in pupil-teacher or pupil-pupil discourse [3]. 

Harrop [3] designed a computer-based environment, ENCAL, in which chil- 
dren could work on solutions to word problems in any of three representations: 
iconic, where concrete terms in the problem were directly represented by pictures; 
data-tree; and a simulated four-function calculator. The data tree was meant to 
serve as an intermediate between the highly concrete iconic representation and 
the very abstract calculator formula. The 3 representations were linked so that 
modifications to one were propagated to the others whenever possible. ENCAL 
was intended for classroom use so that children could be invited to explain what 
they observed, as illustrated below. Pilot studies, however, showed that children 
did not fully understand the role of the tree, and when tried to edit it they got 
into a tangle. 

In this paper we describe 

1. a small but we think significant change to the conventional tree represen- 
tation, to improve the visual correspondence between tree and formula and 
thus help reasoning; and 

2. two novel approaches to the editing of such trees. 

We conjecture that of these two editing systems, the one that is easier in 
conventional HCI terms (Anemone) will prove less likely to stimulate reflec- 
tive abstraction-which implies that learning will be harder. The ’harder’ editor 
(Carousel) seems likely to lead to easier learning. Though counter-intuitive, this 
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fits with Brna, Cox and Good [1], who argue that it is not always educationally 
desirable to use the diagrammatic representation that makes the task easiest, 
simply because that will not encourage reflective abstraction. 

2 Improved Visual Correspondence 

Effective reasoning with external representations such as diagrams requires 
a good match between the demands of the task and the type of information 
read-off afforded by the representation. With hindsight we realised that although 
the conventional representation of a data tree, as used by ENCAL (Figure 1, left), 
allows easy read-off of some aspects of the information, e.g. chains of depen- 
dencies, it unnecessarily obscures exactly what is essential to our purpose: the 
correspondence between tree and linear expression. 

First, the parentheses in the linear representation have no match in the 
tree, even though their occurrence can be deduced from information in the tree. 
ENCAL ’s representation addressed this by introducing rectangles within the tree 
(Figure 1, right). 

Next, the x — y placement of nodes in the conventional tree, drawn to ’look 
pretty’, has no direct correspondence to their location in the linear representa- 
tion. Such aesthetic-based criteria do not necessarily conform to ease-of-reading 
[6]. Hence learner children may find it impossible to relate a node in the tree 
to its corresponding symbol in the algebra; they simply cannot work out which 
node corresponds to which symbol. We have a simple but effective fix: tree nodes 
are drawn vertically below their corresponding symbols in the expression. 



ENCAL-style tree without rectangles. 

The equivalent linear expression, 
2+(7+8)x3, was presented as part of the 
virtual calculator, with no clear visual 
correspondence between the two repre- 
sentations (See web site for screenshot). 





ENCAL-style tree with a rectangle cor- 
responding to the parentheses in the 
linear expression (not shown). An im- 
provement, but not enough. 




Fig. 1. ENCAL-style tree 
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Lastly: what does the tree re- 
ally mean? We have chosen to inter- 
pret it as the order of evaluation of 
an expression, since that description 
fits with typical teaching; and to en- 
sure that order can easily be read 
off from our trees, we ensure a sec- 
ond property, again simple but effec- 
tive: heights of nodes in the tree cor- 
respond exactly to the order of evalu- 
ation. Higher nodes are always eval- 
uated before lower nodes. This ad- 
dresses one of the most difficult prob- 
lems in teaching the evaluation of 
these expressions, which is to teach 
children the order in which sub-expressions should be evaluated. Figure 2 il- 
lustrates the improved visual representation. 

Our new representation introduces a new problem, however. Because addition 
is associative, in expressions like a+b+c either operator could be evaluated first. 
What shall we do in the tree: shall we allow operators to be at the same height, to 
denote that order is immaterial? If so, what shall we do with the first expression, 
a+bxc, where of course order is important? We consider this issue in the next 
section, as it part of the distinction between the editors we have devised. 




Fig. 2. Anemone/Carousel tree showing 
improved visual correspondence between 
tree nodes and the symbols of the linear ex- 
pression, presented directly above the tree 



3 Operations on Trees 



The ENCAL environment allowed 
children to edit trees directly 
with the usual operations: delete/ 
insert / move arc , delete/insert / 
move/change operator or number. 

In HCI terms, that editor can 
(with hindsight) be strongly crit- 
icised. Many low-level operations 
are required to achieve one opera- 
tion ’in the head’ (see Appendix). 

Children using ENCAL found it 
nearly impossible to edit a tree 
successfully [3]. 

We have developed two new 
editors using the representational 
principles above (also including 
a simple animation to morph be- 
tween the tree and the linear ex- 
pression). They are highly specialised demonstration versions, focused on the 




Fig. 3. Two operators are moved, + down and 
x up, changing the interpretation of the tree. 
This requires 2 actions in Carousel (move + 
doun, move x up) but only 1 in Anemone (select 
+, press Remove Brackets) 
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problem that most perplexed children in the ENCAL studies, viz. how to change 
bracketing. 

The first, Anemone, has direct operations for adding and removing brackets, 
so that a+bxc can readily be converted to (a+b)xc : just select the operator 
and click the ’add brackets’ button. The tree reshapes itself accordingly (see 
Appendix) . 

The second editor, Carousel, directly changes the order of evaluation. Any 
operator in the tree representation can be selected and moved up or down one 
step in the tree. By moving + higher and x lower, or vice versa, the user can 
change a+bxc to (a+b)xc and back (Figure 3). 

Carousel requires more operations than Anemone - about twice as many 
for inserting brackets: select the operator and then press Move Up (or down) 
as often as needed. But Carousel allows trees to enter uninterpretable states 
(Figure 4). Under the strict rule introduced above that height corresponds to 
order of evaluation, this tree is ambiguous. In a strict HCI sense, Carousel is 
harder. We regard this as a feature, not a bug, as argued in the next section. 

4 Encouraging Reflective Abstraction 

Part of the aim of this project was to find a representation that would encourage 
reflective abstraction, especially by inviting children to explain what they ob- 
served in pupil-teacher discourse. Here is a fragment of a pilot session between 
a teacher (T) and a 13-year old pupil (G) of high mathematical ability exploring 
Carousel. It illustrates the type of reflection that we hope to provoke. 

T: Would you like to try moving oper- 
ations up and down. 

G moves an operation up. Gets a mes- 
sage that the tree is ambiguous. (Break- 
down - reflection starts) 

G: Why is this tree ambiguous? I see, it 
is not clear which of these two can be 
performed first. ( Understands the ab- 
stract structure ) 

G: What will happen if I move this op- 
eration one more level up? 

He moves the operation and a new tree 
is drawn. G again needs to move the 
operation several times to grasp the 
changes on the tree. 

G: I see - moving an operation up will 
make it be performed earlier, moving 
down means that it will be performed 
later. 



B1000B0 
G0 □ 00 □ 

□ □ 

□ 

Warning: This tree is ambiguous - two boxes are at the same 
level so we don t know which to do first 



Fig. 4. Carousel ambigous tree. No lines 
are drawn between nodes because the or- 
der of evaluation is indeterminate. The 
warning message and the thick outlines 
are shown in red 



Teaching Children Brackets by Manipulating Trees: Is Easier Harder? 411 



Reflective activities happened both when the child was working on his own, 
when his expectations were confounded, and also when a tutor asked him to 
predict the result of an operation, point at features of the abstract structure, or 
to provide explanations. 

Unlike Carousel, Anemone creates few impasses. The same child quickly be- 
came bored with it, and it gave the tutor few opportunities to provoke reflection. 

5 Conclusions and Future Work 

We regard the pilot session as very promising; and if our surmise is correct, 
it indicates that the easiest tool to use may not be the most suitable for all 
purposes. We hope to find resources to implement stable versions of Carousel 
and Anemone within ENCAL for classroom evaluation of the improved system. 
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Appendix 

Manipulating Trees in the ENCAL Design 



Start: 2+3x4+7 Target: 2+3x(4+7) 



Step 1: Start position. In this 

design, the linearised version (not 
shown) is off to one side in a dif- 
ferent window-pane. 

mm 

H ED 




Step 2: Disconnect 3, 4, 7, +, and 
x. (By now it’s easy to forget what 
you’re trying to do.) 

0 0 
0 jjl 
0 












Step 3: Move 4, 7 and + and 

reconnect (by right-clicking on +. 

mm 

0 0 




Step Add other arrows simi- 

larly. 
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Manipulating Trees in the Carousel Design 



Start: 2+3x4+7 Target: 2+3x(4+7) 



Step 1: Start position. To assist in 
locating correspondence between 
the lineraised version and the data 
tree, the operators are uniformly 
distributed in both representations 
and are horizontally aligned. 

00 □ □ 0 □ □ □ 




Step 3: The x operator has 

been lowered. The tree is now 
unambiguous and the arrows are 
restored. The parentheses needed 
in the linearised version are indi- 
cated by the dashed rectangle. 



Step 2: The + operator has been 
raised. That makes the tree now 
ambiguous (since x and + are at 
the same height) so now arrows can 
be drawn. The offending operators 
are highlighted in red. 

GO 0 GO [* GO E GO 
SEES 

[x [ 

□ 



GO EG GO GD GO □ GO 



414 



T.R.G. Green et al. 



Manipulating Trees in the Anemone Design 



Start: 2+3x4+7 Target: 2+3x(4+7) 





Step 3: The ’add brackets’ button 
has been pressed and the tree is 
automatically rebuilt. 



□ □□□□□□ 
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Extended Abstract 

Smart Diagram Environments (SDEs) are software applications that use struc- 
tured diagrams to provide a natural visual interface [5]. Such an environment 
behaves as if the computer “understands” the diagram, for example by providing 
manipulation that takes into account the diagram’s structure and its intended se- 
mantics. We present Cider 1 a Java toolkit for building SDEs which greatly sim- 
plifies this task. Cider is a generic component-based system which is designed 
to be easily embedded in Java applications. It provides automatic interpreta- 
tion of diagrams as they are constructed and manipulated, structure preserving 
manipulation, and a powerful transformation system for specifying diagram ma- 
nipulations. Cider’s main innovation is its component-based approach to SDE 
development which provides substantially increased architectural flexibility to 
the application programmer. 

SDEs are useful in a wide variety of applications such as high-level query 
languages for spatial information systems, CAD systems, diagrammatic theorem 
provers, and on-line education. As a simple example, consider an SDE that is 
designed to teach students about Finite State Automata (FSAs). It would allow 
the students to create and modify FSA diagrams, it would visually demonstrate 
how to construct a FSA from a regular expression and whether a particular in- 
put string belongs to the language of the FSA, and visually demonstrate how to 
construct a deterministic FSA from a non-deterministic FSA, and how to min- 
imise it. It might also provide automatic layout and beautification (i.e. “pretty 
printing” ) . 

Unlike a standard graphics editor such as xfig the SDE would understand 
the structure of FSAs and automatically interpret the FSA diagram drawn by 
the user. In principle, the user constructs these diagrams through basic drawing 
operations from primitive graphical objects, such as lines and circles. As an ex- 
ample consider the FSA shown in Figure 1. The SDE should identify the possible 
transitions and recognize that the two concentric circles on the right of the dia- 
gram and the text, S3, that lies within them forms a single unit which represents 
a final state. Later if one of the circles or the text is moved during user editing 

1 Cider is freely available under the terms of the GNU General Public License at: 
http: //www. esse . monash. edu. au/~tonyj /CIDER/ 
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Fig. 1 . A simple finite state automaton 



the other components of the state 
should move with it as well as any 
transitions to or from the state. The 
user should also be warned if they 
have drawn an invalid FSA, for in- 
stance if there is an unlabelled transi- 
tion or no start state. 

SDEs are potentially very useful, 
but unfortunately not easy to build. 
Luckily, there are a number of com- 
mon functions that every SDE has to 
provide and implementing these subtasks can be supported by generic software 
tools. In particular, most SDEs have to provide modelling of the diagram compo- 
nents, interpretation, structure preserving manipulation, visual transformation, 
and layout/beautification. A logical approach is to build a generic SDE system 
which provides these capabilities but which is then customised for a particular 
diagrammatic language based on a formal high-level specification of that lan- 
guage. This is the idea behind the tools DiaGen [7], GenEd [3], Recopla [6], 
Penguins [2] and GenGed [1]. However, given the potential usefulness of SDEs, 
it is interesting to ask why these system are not commonly used (except by those 
who developed them). 

We believe a major reason is the lack of architectural flexibility of the above 
systems. They all severely restrict how the SDE application programmer must 
write the remainder of the application and also the application GUI. The prob- 
lem is that they provide a large single system which “wraps around” the main 
application code. This appears to be the wrong approach, as the main applica- 
tion code and not the user-interface should take center stage. A second reason 
why previous SDE systems are not widely used is that they are each limited 
in their capabilities (although in different ways). For instance, only Penguins 
provides (very) limited diagram beautification, while it does not provide trans- 
formations. This problem is compounded by their underlying architecture which 
makes them difficult to extend. Finally, existing SDE construction systems are 
not compatible with commonly used toolkits for GUI development and so require 
significant investment by an application programmer. 

To overcome the limitations of the previous toolkits we have developed 
Cider, a new toolkit for building SDEs. Cider provides the following capa- 
bilities: 



— Automatic interpretation of diagrams as they are constructed and modified 
based on Constraint Multiset Grammar (CMG) specifications of diagram- 
matic syntax. 

— Structure preserving manipulation. This is provided by an application spe- 
cific constraint solver. 

— A powerful transformation mechanism for specifying diagram manipulations 
and user interactions that is fully integrated with the incremental parser, 
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allowing transformations to be couched in terms of high-level diagram com- 
ponents yet understood in terms of low-level displayable components. 

— Efficiency is achieved by compiling grammars, transformations and con- 
straint code. 

— Although Cider does not yet provide diagram layout or beautification its 
architecture allows the application to provide this. 

However probably the most important innovation of Cider is its component- 
based architecture. Rather than wrapping around application specific modules, 
Cider consists of generic components which the application wraps around. Such 
a component-based architecture provides much more flexibility for the applica- 
tion programmer and it does not impose any restrictions on how the main appli- 
cation is programmed. It means for instance, that the application programmer 
has complete control over the interface, can choose not to use some components 
of the system and even to extend the toolkit’s capabilities by providing additional 
components, such as a customized multi-way constraint solver. 

To further facilitate Cider’s integration into real-world applications, Java 
was chosen as the implementation language as Java is one of the most com- 
monly used languages for implementing applications that require graphical user 
interfaces. Other benefits of Java include its platform independence, and its sup- 
port for component-based programming as well as a whole culture of component 
exchange and re-use. Another feature of Cider is that it allows the diagrammatic 
language grammar to be specified using XML. 

The components that make up the Cider toolkit are shown in Figure 2, which 
also indicates how these components are used in the creation of an application. 
The white boxes indicate components of Cider, the cross-hatched boxes indicate 
optional components that can be tailored to extend the capabilities of the toolkit 
itself. The shading indicates components that must be created by the application 




Fig. 2. The components that make up the Cider toolkit 
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Fig. 3. A screenshot of an FSA editor created with the Cider toolkit 



developer. The double-headed arrows indicate which Cider components interact 
with which other components. 

Using the Cider toolkit, we have successfully created an example FSA edi- 
tor that includes incremental interpretation, structure preserving manipulations, 
and implements transformations that allow input strings to be processed as well 
as allowing a FSA to be created from a regular expression (see Figure 3). Our 
directions for future work involve extending the toolkit to provide beautification 
and automatic layout, and performing further case studies to evaluate the use 
of Cider for more complex SDEs such as for UML diagrams. 

For a description of Cider that focuses more on its transformation and con- 
straint solving capabilities, see [4]. 
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Abstract. In this paper, an overview of the general construction princi- 
ples of an interactively animated diagrammatic system, called diagmm- 
matic spreadsheet , is described. An example of the use of the system in 
interactive exploration of diagrams is given. 



1 Introduction 

Animated diagrams may acquire much higher functionality when the construc- 
tion and animation process can be made interactive in the sense of full control 
by the user over dynamic changes of the diagram or its parts. A number of exam- 
ples of such diagrams were implemented for some special cases. However, they 
usually allow the user to manipulate diagram’s elements in a pre-programmed 
way, but not to define easily his/her own diagrams and manipulate the relevant 
constraints. The diagrammatic spreadsheet concept, inspired by the graphics 
simulation system ThingLab developed by Borning [2], is an attempt at filling 
this gap by developing a fully interactive animated diagrammatic system. The 
construction of the system is based on the internal hypergraph representation 
called a CP-graph with the realisation scheme concept [4]. A constraint that 
describes a relation between diagram components is used here like a formula 
in a spreadsheet which automatically recomputes the value of a cell (specifying 
some graphical element’s attribute) whenever any of the other cells change. The 
system will serve as an exploratory tool for a researcher, or as a front-end to 
a more automatic diagrammatic reasoning system, where the dynamic process of 
drawing is more important than the finished diagram. Due to space limitations, 
the presentation is restricted to an overview of the general idea and a simple 
example illustrating the use of the proposed system. For details, see [6]. 

2 General Scheme 

The system structure can be divided into the following, partially overlapping 
modules: 

* The research has been partially supported by the grant No. 5 T01F 002 25 (for 
years 2003-2006) from State Committee for Scientific Research (KBN). 
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a) “Fixed” values: Distance r; angle a: 



“Distance” 



Pi d P2 



“Functional” constraint: f(xi, X2,...,Xn)=y P — ► ©-► 180° 



“On” © 



1 A P B 



“In” © 




B- 



c ) “Intersection” 0 



A B 
12 



“Perpendicular” 0 




Fig. 1 . Example numerical (a), topological (b) and geometrical constraints (c) 



1. Diagram editor used to construct and edit objects and constraints. 

2. Generation and maintenance of the internal represeritatiori of a diagram, 
based on graph grammars and hypergraph transformation model. 

3. Constraint solver controlling diagram transformations. 

With a set of general graphical primitives, the user interface module allows 
the user to construct geometric figures using the compass-and- ruler paradigm [3]. 
The set includes, among others, points, lines, line segments, polylines, polygons, 
elliptic arcs and textual strings. Each primitive can be specified by a set of its 
graphical properties like location, shape, color, etc. The user can group primitives 
to construct complex objects, as well as define and edit the constraints binding 
them. Once an object is created, user can interactively manipulate its visual 
representation, changing its properties. 

Internal representation is based on the concept of CP-graphs [ ]. Diagrams 
are represented as hypergraphs , with each component represented as a hypernode , 
while hyperedges describe the relationships (constraints) between components. 
An object can be specified with several different representations, according to 
the choice of individual translation rules created by the system designer. For 
more details on how such hypergraphs are created and maintained, see [6]. The 
realisation scheme [ ] prescribes the mapping between the model and graphical 
appearance of its constituent parts in a displayed diagram. It allows a single 
internal graph to model several different graphical realisations of the diagram 
associated with it through different realisation schemes. 

Constraints are used to specify the relations between diagram components. 
They are used here like formulas in a spreadsheet, which automatically recom- 
pute the value of a cell (specifying some graphical element’s attribute) whenever 
any other cells bound by the constraints change. Constraints can bind explicit 
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Fig. 2. The liypergraph representation (b) of the static diagram (a) and different 
rendering of the diagram (d) after hypergraph transformations producing (c) 



parameters like distances, angles, as well as establish relations between the whole 
objects, like incidence, parallelism, perpendicularity, collinearity, etc. (see Fig. 1). 
Functional constraints (specified as algebraic formulas) can calculate new numer- 
ical values of parameters. When the attributes of a graphical object change, the 
relevant constraints are enforced, and the affected graphical objects are automat- 
ically changed and redrawn. That may change the structure of the transformed 
diagram, causing some new information to emerge, or even creating new diagram 
elements representing information gathered during the transformations. 
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3 An Example 

Some features of the system will be shown using an easy diagrammatic proof 
(see Fig. 2a) formulated as follows [1]. Consider a given particular triangle. The 
user can grasp one of the ends of the extended triangle sides and rotate it, as in- 
dicated, with the rest of the diagram reshaping itself automatically to maintain 
all relevant constraints (e.g., the rotation points position, indicated line paral- 
lelism), showing that for all positions of the side the sum of the angles remains 
the same, namely 180°. 

The first step in the use of the system consists of translation of the static 
diagram (Fig. 2a) into its hypergraph form (Fig. 2b). In the hypergraph, we can 
see, among others, hypernodes represented by small rectangles with names of the 
objects, e.g., "Is" means line segment, "hi" means half line, as well as various 
constraints marked as labels of hyperedges, e.g., intersection, parallelism, or 
topological constraint "on" defining the situation of a point lying on a line. 

The resulting hypergraph including constraints described in the diagram may 
be now automatically transformed according to the user’s actions. In this way, 
the user can interactively transform, or animate, the diagram by changing con- 
tinuously some attribute of a dragged object (e.g., a directional angle of the half 
line 1 1 by rotating it around the fixed point B ). The hypergraph structure with 
embedded constraints changes then appropriately other attributes (e.g., the di- 
rection of the half line Z 3 ) maintaining the integrity of the model as specified by 
its structure. This interactive animation mode of the use of the system is very 
useful for finding new properties of objects represented diagrammatically [5]. The 
second, more automatic mode, which involves the use of graph transformation 
rules defined by the user, may lead to a hypergraph of Fig. 2c, with its contents 
rendered as in Fig. 2d, described in more detail in [6]. 
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Abstract. Diagrams mediate thinking and understanding largely 
through the human visual system’s innate ability to perceive visuo- 
spatial structure. Tools for working with diagrams will benefit from the 
ability of machines to identify visual structure in concert with their hu- 
man users. This poster and its companion summarize recent progress in 
perceptually-supported diagram creation and editing. In particular, our 
research group has deployed a document image editing program realiz- 
ing some measure of Gestalt Perceptual Organization for sketches and 
diagrams. 



1 Background: Diagrams as Input versus Diagrams as 
Intrinsic Representations 

Over the past decade many investigators have identified the goal of building 
“intelligent” computer programs for human work with sketches and diagrams. In 
their envisionments, many of which have experimental implementations, the user 
draws and annotates in freehand strokes using pen and paper or digital stylus on 
instrumented whiteboards or tablets. The machine parses and recognizes these 
markings, then responds by cleaning up or “beautifying” drawings, accessing 
databases, performing simulations, or invoking reasoning engines [15, 8, 9, 5, 1, 
16, 14, 7, 13]. 

By and large, most work in this area is focused on providing naturalistic pen 
interfaces as input mechanisms — input to graphics formatting and presentation 
programs, input to database search queries, input to reasoning systems, etc. The 
value-added smarts of the computer is viewed as residing in domain knowledge 
and competence, while the greatest barrier to actualizing this competence re- 
mains the difficulty of achieving robust and accurate machine interpretation of 
diagrammatic input. 

To mitigate this problem, a common and sensible strategy has been to place 
strong limits on what may be drawn by the user, and apply correspondingly 
strong prior constraints about what can be recognized by the system [2], For 
example, one group has demonstrated the ability to interpret closed polygonal 
paths as two dimensional-projections of solid objects [1], while another has shown 
that arrows drawn in idiosyncratic ways can be recognized in the symbology of 
military diagrams [5] . In systems of these designs it is important that ambiguity 
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in the interpretation of any given input marking be eliminated quickly. Otherwise 
is to risk combinatorial explosion in the mapping of input primitives (eg. pen 
strokes) to the components of domain model shapes. 

We observe however that another stance is possible with regard to building 
smart, perceptually-enabled diagram manipulation tools. This approach focuses 
less directly on mapping from a diagram to domain content, but instead views 
the diagram as an object of interest to manipulate in its own right, i.e. , as a 
form of external working memory. The process of drawing and elaborating and 
re-working a diagram engages the perceptual system to form multiple overlap- 
ping interpretations consisting of different segmentations and groupings of the 
primitive markings. On this hypothesis, these motor and perceptual processes 
in turn engage cognitive processes to conceptualize domain content in different 
ways, corresponding to alternative, sometimes partial, sometimes only partially 
coherent, “parses” of the diagram. 

2 Perceptual Organization and Diagram Analysis 

Perceptual segmentation and grouping processes are associated with the Gestalt 
principles of Psychology, dating originally to the early 20th century but receiving 
attention in the contemporary study of Computer Vision [17, 12, 6, 11, 18, 3, 4, 
10 ]- 

While human vision is remarkably adept at recognizing known objects or 
object categories in complex scenes, it is equally capable at finding patterns 
and creating sense out of utterly unfamiliar imagery. Aspects of visual scene 
analysis occurring apart from object recognition per se include figure/ground 
segmentation; segmentation of regions into coherent objects; assigning relative 
depths to surfaces; detection of potentially interesting or novel events; factoring 
shadows and other lighting effects from geometrical properties; tracking moving 
objects; and detecting coherent motion among disparate motion cues. The eco- 
logical rationale for visual systems possessing these abilities has been discussed 
at length [18]. The Gestalt psychologists set out to understand perception via 
simple figures that distill salient visual properties or pattern qualities that in 
natural scenes are found in confluent abundance. As markings on paper, many 
of these figures bear notable similarity to the representational constituents of 
diagrams. 

The Gestalt principles of primary concern in diagram analysis include Smooth 
Continuation, Figural Closure, Spatial Proximity, Symmetry, and Feature Simi- 
larity. See Figure 1. While these principles offer explanatory power in accounting 
for human perceptual capabilities, they have proven extremely difficult to for- 
malize as algorithms for computer programs. Although each of these principles 
is intuitively associated with formal geometrical properties (e.g. smoothness ~ 
continuity in the derivative of contour tangent direction; closure ~ a topological 
donut) the perceptually relevant phenomena are “fuzzy,” or tolerant to devia- 
tions from straightforward mathematical formulations. Moreover, these princi- 
ples interact and trade off with one another. Computer Vision lacks any adequate 
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Fig. 1 . Illustrations of five Gestalt principles of visual perceptual organization, a. 
The central figure appears to be a combination of parts having smooth boundaries, b. 
The top dot appears to fall on a foreground object defined by a nearly closed boundary 
contour, while the bottom dot appears to lie on the background, c. The apparent visual 
partitioning of the text based on proximity overrides a partitioning based on semantic 
content, d. The curves that appear to go together are the pair forming a bilateral 
symmetry, e. The curve appears to continue from point B on the basis of similarity of 
its local properties 



formulation for uniting the various Gestalt phenomena under a common theo- 
retical or algorithmic framework. 

Nonetheless, work in our laboratory suggests that tools for working with di- 
agrams can benefit from the ability to identify visual structure in accordance 
with the Gestalt laws, even at today’s relatively primitive level of technologi- 
cal development. Perceptual grouping is useful in at least two ways. First, as 
machines become capable of perceiving visual structure corresponding to that 
readily identified by human users, user interface techniques can be devised that 
permit people to select and manipulate salient collections of visual markings at 
will, and thereby reconfigure diagrams according to their imaginations. Second, 
successful perceptual organization offers a stable foundation for recognition of 
symbols, shapes, notations, and domain objects. Not only do visually salient 
segmentation and grouping raise significant structure above noise, clutter, and 
imprecision in drawing and imaging, but these processes inherently make avail- 
able alternative interpretations of locally ambiguous image data, which simplifies 
processes for matching configurations of lines and symbols to libraries of known 
shapes and notations. 
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The companion poster to this one presents a prototype application, called 
ScanScribe that demonstrates how visual structure at the level of Perceptual 
Organization can be exploited to facilitate selection and editing of visually salient 
figural objects in diagrams. 
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Abstract. We have implemented a prototype document image editor 
that incorporates principles and algorithms for Perceptual Organization 
in order to facilitate selection and manipulation of visually salient image 
objects. Called ScanScribe, the program serves two purposes. First, it 
offers an illustration of the advantageous use of Gestalt principles of seg- 
mentation and grouping of text and line-art found in diagrams. Second, 
it is a practical tool for modifying existing diagrams and composing new 
diagrams from mixed source material. ScanScribe’s user interface design 
and foundational representations are designed to scale to support recog- 
nition of domain objects found by structural matching through subgraph 
correspondence or other techniques. 



1 Introduction 

Over the past decade many investigators have identified the goal of building 
“intelligent” computer programs for human work with sketches and diagrams. 
The companion poster to this one discusses this motivation and associated ex- 
perimental artifacts. This poster presents our efforts to build a practical image 
editing tool embodying some of the perceptual abilities of human users. 



2 Artifact: The ScanScribe Image Editor 

A fundamental operation of image and document editing programs is selection. 
Once the user has selected certain desired markings, they may proceed to move, 
copy, delete, rotate, scale, or otherwise modify them. Our research has led to 
the implementation of a number of programs for perceptually-supported selec- 
tion of diagrammatic material. The most mature of these, called ScanScribe, is 
available for download and will be offered as a demonstration to accompany this 
poster [4]. ScanScribe implements basic forms of foreground/background separa- 
tion, object segmentation, and Gestalt grouping. Grouping principles currently 
implemented are smooth continuation of line-work, the detection of perceptual 
(but not necessarily geometrically) closed paths in line-art [3], and grouping of 
text elements by spatial proximity and alignment. See Figure 1. 
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Fig. 1. Stages of perceptual organization, a. Input image, b. Image fragments corre- 
sponding to individual characters, blobs of touching characters, and simple curves, c. 
Grouping rules attempt to identify groups of fragments exhibiting smooth continuation, 
perceptually closed contour paths, and spatial proximity and alignment 



Using ScanScribe, it is easy to modify existing diagrammatic material start- 
ing from scanned images. Figure 2 illustrates. Here, a practitioner of Category 
Theory wishes to illustrate through a diagram that commutativity may be ap- 
plied to derive one formal structure from another. After importing the image 
of Figure 2a into the program from a tiff file, ScanScribe performs a number of 
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Fig. 2a. Diagram scanned from paper. (Full caption is on following page) 
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Fig. 2b, c. Image-based editing of a Category Theory diagram using the ScanScribe 
image editor, a. Diagram scanned from paper, b. Screen snapshot of material selected 
by point-and-click (clearly visible in color as a green halo behind several image objects), 
c. Modified figure derived from a, illustrating a point about a derivation in this domain. 
Note the added arc-arrow which was imported from a digital camera photo of an arrow 
drawn freehand on scratch paper, then scaled and positioned within the diagram 



ScanScribe: Perceptually Supported Diagram Image Editing 431 



steps that facilitate transformation of the original into the desired figure. First, 
foreground markings are distinguished from from white or light background; the 
background is then rendered transparent. Next, foreground markings are decom- 
posed into elemental fragments corresponding to characters and simple curves. 
Finally, groups are formed of proximal text and aligning curve fragments. So- 
phisticated user interface techniques are used to offer the user a combination 
of selection by enclosure and selection by point-and-click. In just a few minutes 
Figure 2c is obtained. As part of the process, the practitioner wished to include 
an arcing directed arrow between two nodes. They sketched the arrow freehand 
on scratch paper, then used a digital camera to import the scratch paper im- 
age into ScanScribe. ScanScribe’s foreground/background separation algorithms 
eliminated lighting artifacts, and the arrow was easily sized and rotated into 
position. In other words, the program enables the free and fluid mixture of di- 
verse image material, aspiring to a WYPIWYG (What You Perceive is What 
You Get) style of smart interface. 

3 Prospect: Recognition of Visual Language Structure 

To take smart diagram manipulation to the next level will involve recognition 
of more diagram-specific and domain-specific aspects of structure, above and 
beyond Gestalt principles. For example, Figure 2a is recognizable as an instance 
of a generic class, the Node/Link diagram. As such, one might expect that op- 
erations such as moving nodes should cause their associated links to follow in 
order to preserve syntactic connectivity. To recognize shapes and configurations 
in diagrams such as these, our group has adopted the well-known approach of 
subgraph matching. We have found that perceptual organization techniques offer 
benefits with regard to some of the difficulties these methods have traditionally 
been found to encounter [2, 1] 

While the capability to recognize and respect spatial structure at the level 
of visual language constructs is in some respects readily available to today’s 
structured graphics editors, the ability to manipulate in this way, say, any node- 
link diagram obtained from any source remains a significant research goal. To 
achieve it will raise the image itself to the status of common currency between 
perceptually-enabled machines and human visual cognition. 
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Abstract. We have developed a diagrammatic logic for theorem prov- 
ing, focusing on the domain of metric-space analysis (a geometric domain, 
but traditionally taught using a dry algebraic formalism). To evaluate 
its pragmatic value, pilot experiments were conducted using this logic - 
implemented in an interactive theorem prover - to teach undergraduate 
students (and comparing performance against an equivalent algebraic 
logic). Our results show significantly better performance for students us- 
ing diagrammatic reasoning. We conclude that diagrams are a useful tool 
for reasoning in such domains. 



1 Introduction 

Euclidean plane geometry has always been taught using diagrammatic reasoning. 
Traditionally though, only algebraic proofs are allowed in the slippery realms of 
more abstract geometries. We have investigated using diagrams in such a domain, 
that of metric- space analysis. It is a hard domain, and even great mathematicians 
such as Cauchy have made mistakes in this subject. 1 Students typically find 
it daunting, and we conjecture that the dry algebraic formalism used in the 
domain is partially responsible for these difficulties. Currently our logic only 
covers a fraction of the domain, but this was sufficient to run some short tutorials 
on the concept of open sets. This allowed us to experimentally compare our 
diagram logic with an equivalent algebraic logic. 



2 The Logics 

We can only give a brief overview of the logics here. 2 The algebraic logic uses 
natural-deduction rewrite rules. The diagram logic is specified using redraw rules , 
which are a visual adaptation of rewrite rules. Redraw rules are defined by an 
example diagram transformation; figure 1 shows an example. The diagrams in our 
logic are made up of example objects with relation statements. These statements 
can be represented in three different ways: implicitly (where the relation is true 
for the objects drawn, e.g. a £ B for a = i, B = [0,1]), graphically (using 

1 See E. Maxwell “Fallacies in Mathematics” Cambridge University Press, 1959. 

2 For more information, please refer to the paper by the same authors in Diagrammatic 
Representation and Inference Springer- Verlag, 2002. 
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conventions such as ‘a dotted border indicates an open set’) or algebraically. 
For both reasoning styles, students were restricted to constructing valid forward 
reasoning proofs using equivalent rule-sets. The algebraic proofs produced could 
be described as ‘typical text-book proofs’. 

Both logics were implemented in a user-friendly interactive theorem prover, 
which we call Dr. Doodle for its drawing mode. Figure 2 shows a sample screen- 
shot. This system was designed to minimise the potential effect of differences in 
the interface/presentation methods, so that logics could be compared without 
other factors unduly affecting the results. 




Fig. 1. A redraw rule for “X an open set, x £ X => 3e > 0 s.t. {x' : \x' — x\ < e} C X ” 




Fig. 2. Screenshot of Dr. Doodle in diagrammatic reasoning mode 
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3 Experimental Design Results 

Two experiments were conducted: the first used 10 1 st year students (for whom 
the material covered was new), and the second 10 2 nd -3 rd year students (who 
had seen the material before). For each experiment, the students were randomly 
split into 2 groups of 5. One group worked using diagrammatic reasoning, the 
other used algebraic reasoning. Over two 45-minute lessons, the students were 
taught to use the Dr. Doodle system, then tested on a set of exercises. Lessons 
were conducted entirely on computer without human interaction. Results from 
the 1 st years showed that the test-exercises were too ambitious, producing a very 
coarse-grained measure of ability. The exercises were therefore modified for the 
2 nd experiment, which gave a better spread of results (it is probably this which 
is responsible for the difference in the results between experiments). For each 
student, we calculated two measures: ‘score’ (correct exercise questions), and 
‘inefficiency’ (wasted actions, based on analysis of user-logs) . Informal feedback 
was also gathered via a questionnaire. 



Experiment 


Reasoning Style 


Score 


Inefficiency 


1 st years 
1 st years 


Diagrams 

Algebra 


30% <7=22% 
30% <7=14% 


1.49, <7=0.98 
2.74, <7=1.99 


2 nd / 3 rd years 
2 nd / 3 rd years 


Diagrams 

Algebra 


63%, <7=6.4% 
42%, <7=20% 


1.27, <7=0.03 
1.63, <7=0.19 



Both experiments show students working more efficiently when using dia- 
grams. The second experiment also shows the diagrams group scoring higher. 
With such a small sample, we would not expect statistically strong results. 
However the results from the second experiment do show statistically significant 
support for the conjecture that diagrammatic reasoning is better at this task 3 
(both measures are significantly better at 95% confidence using a one-tailed t 
test). Informal feedback gave comments such as: “The pictures were useful for 
helping understand what was going on. Better than written explanations a lot 
of the time.” 

4 Conclusion 

These positive results are not surprising. As the domain is a geometric one, we 
would expect visual representations to be useful. We conclude that diagrammatic 
reasoning is a useful tool in this field. However further experiments are desirable, 
especially as these experiments did not look at the interesting questions of how 
and why diagrams are useful here (and hence how general these findings are). 
This work is described in more detail in the first author’s forthcoming Ph.D. 
thesis. Hopefully this project will be extended to produce a tutorial system for 
this. 

3 Although it is also possible that the difference comes from having mental access to 
multiple representations, rather than from diagrammatic reasoning per se. 
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Abstract. Developing a knowledge-based version of a greenhouse design and 
simulation tools, DAMOCIA-Design and DAMOCIA-Sim, we realized the 
convenience of using multiple diagrammatic notations, and their advantages. 
First, we used an extension of the Task-Method Diagrams of the 
CommonKADS methodology to model the relation between the different tasks 
to be done and the methods we could use to solve them. Method Flux Diagrams 
model the relation between the different elements composing the method. In 
order to implement the design software, we selected a distributed architecture, 
DACAS, that integrated agents using behavior definitions. These are modeled 
as execution plan diagrams. In this work, we present how these execution plan 
diagrams can be generated in a general way from task-method and method-flux 
diagrams included into the knowledge model of the system. 



1 Introduction 

One problem we find actually developing knowledge-based tools using 
methodologies as CommonKADS [1], it is the gap existing between the theoretic 
knowledge models (mainly knowledge models of the tasks to be accomplished) and 
the real structure of the software to be implemented. Using related diagrammatic 
notation sets to model both the knowledge model (at the analysis of the system stage) 
and the high level execution elements (at the design stage), as well as a 
transformation mechanism (composed by a set of steps and rules), it is possible to 
extract semi-automatically the high level algorithmic structure of the tool from the 
knowledge model. 

In this work, we present a proposal for these notation sets and transformation 
mechanism. This proposal is part of the methodological results of a project 
DAMOCIA (Computer Based Design and Building of Automated Greenhouses) 
developed by our research group and financed by the European Union into the 
framework of the ESPRIT projects (P7510 PACE) and the Ministry of Industry of 
Spain (PATI PC-191). 
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2 Notation of the Knowledge and Execution Diagrams 

One key point generating the design model, in our work schema, was using 
diagrammatic notation because it simplifies models and improves understanding. 
Modeling the knowledge system, we proposed using two graphic representation tools, 
Task-Method Diagrams and Flux of Method Diagrams. First group of diagrams 
represent the relation between task, methods, primitives and transfer functions; using 
the original notation of CommonKADS and applying decomposition rules. 

In order to assemble the target tools, it was required coding the different execution 
and evaluation processes integrated into our distributed architecture, DACAS [2], and 
the behavior definition to be used into the architecture to control their execution. In 
order to improve its understanding, we proposed a graphic notation. Execution Plan 
Diagram, to model the behavior definition. 

Flux of Method Diagrams let us to represent graphically the control flow into the 
methods, independently for each one method. Using a notation intermediate between 
the TMDs and the EPDs, these are fundamental in order to assemble the execution 
plans of the DACAS architecture. This notation let us to model the algorithmic 
structure of a method, linking their basic components, subtasks, transfer function and 
inferences. It is an alternative description of the pseudo-code used in CommonKADS 
to describe how it is managed this decomposition. Usually, FMDs can be extracted 
from the specific TMD elements, offering a more compact and clear description. The 
objective of next section is present a methodology to obtain the Execution Plan 
Diagrams integrating previously defined TMDs and FMDs. 



Sequence ofprocesses Selection pi processes 




Parallel execution of processes 




Fig. 1 . Notational elements of the Execution Plan Diagrams 
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3 Methodology 

The behavior definitions, and/or execution plans, define how the control module of 
the DACAS architecture manages the different agents. Execution plans define the 
behavior of the system in an abstract form, before instantiating the different 
parameters. Behavior definitions are execution plans instantiated. 

There are two types of agents to take account of, real and software agents. Real 
agents communicate easily through the network and/or operating system 
communication mechanisms. Software agents require to be integrated developing an 
execution plan. These can be generated through a 5 steps method: 

1. Define the scope of the software agents from the TMD. 

2. Initialize the execution plan with the first method selection block. 

3. Substitute methods by their FMDs. 

4. Substitute the different subtasks by their method selection blocks or DACAS 
processes. Substitute inferences and transfer functions by DACAS processes. 

5. Repeat 3 and 4 until there are no more method references. 

4 Conclusions and Further Works 

Main conclusions of the actual work are: 

1. Formalizing knowledge models of the CommonKADS methodology using 
mainly Task-Method and Extended Task-Method Diagrams improves 
understandability. 

2. Defining related diagrammatic description of the control inside methods and 
final task behavior descriptions helps obtaining these later design schemas. 

3. We have defined an automatic mechanism to obtain task behavior descriptions 
from previous TMDs and FMDs in cycles of five steps. 

Actual and future developments include: 

1. Assembling a graphic software tool to develop different sets of diagrams and 
manage the assembling of the behavior description automatically. 

2. Using the proposed methodology in other application fields 

References 

[1] Schreiber, G., Akkermans, J.M., Anjewierden, A.A., de Hoog, A., van de Velde, W., 
Wielinga, B.J.: Knowledge Engineering and Management. The CommonKADS 
Methodology. MIT Press (1999) 

[2] J. F. Bienvenido, A. Corral, R. Marin, and R. Guirado, "DACAS: A Distributed 
Architecture for Changeable and Adaptable Simulation”, Proceedings of International 
ICSC Symposium on Engineering of Intelligent Systems, EIS'98 , Vol. 3, Artificial 
Intelligence, Tenerife, Spain, 1998, pp. 101-107 



Selected Aspects of Customization 
of Cognitive Dimensions for Evaluation 
of Visual Modeling Languages 



Anna E. Bobko wska 

Gdansk University of Technology 
Narutowicza 11/12, 80-952 Gdansk, Poland 
annab@eti . pg . gda . pi 



Abstract. For the successful application of diagrams in software engineering, 
high quality visual modelling languages (VML) are required. There is a need 
for new effective methodologies of VML evaluation. This paper discusses 
selected aspects of applying cognitive dimensions as a basis of the evaluation. 
Then, it briefly presents CD-VML methodology which integrates the cognitive 
dimensions with a theory of visual modelling languages. Finally, it summarises 
results of empirical evaluation of the methodology made with a CD-VML-UC 
questionnaire - a product of the methodology for use case diagrams. 



1 Cognitive Dimensions for VML Evaluation 

Models are used in software development for the purpose of ‘visualising, specifying, 
constructing and documenting the artefacts of software-intensive systems' [7]. In the 
Model Driven Architectures (MDA) approach [6], they play the central role in the 
activities of software development, integration and maintenance. MDA delivers an 
advanced framework for several types of models as well as methods and tools for 
their mappings and automatic transformations. The quality of the models is a crucial 
factor of successful implementation of the MDA vision. On the other hand, there are 
still a few open questions: What are high quality models? Which visual modelling 
languages do support creating such models? And how could it be verified? 

The main problem to overcome is complexity. Software systems become 
increasingly complex and the same happens to their models. In order to reduce 
complexity which software developers face numerous interrelated diagrams are in 
use. They represent system from a given perspective at a certain level of abstraction. 
Therefore there is a need for the management of not only simple models, but also the 
management of relationships between their views. Moreover, MDA assumes 
existence of a few interrelated representations of the same system, which means a 
need for management of many interrelated models. In order to realise this vision 
successfully more advanced modelling languages are required then simple notations 
used some time ago. The progress in also needed in the area of methods of evaluation. 
There is a need for new methodologies of visual modelling language (VML) 
evaluation, which could verify the quality of contemporary VML in an effective and 
efficient manner. They should also be flexible enough to enable evaluation of 
different VLMs in several contexts of their use. 
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Although there are many approaches to the definition of the quality of models and 
VML, most of their descriptions fall into the categories of expressiveness, usefulness 
for automatic transformations and usability for developers. The aim of applying 
cognitive dimensions (CD) in this research was to facilitate evaluation of the last. It is 
maintained that the cognitive dimensions can easily be applied to any notational 
system (because they were designed for them). However, previous empirical studies 
[3] with application of the cognitive dimensions questionnaire [2] for VML evaluation 
have revealed several problems. The CD questionnaire was too general, lacking 
precision, difficult for users, inefficient and did not result in discovering problems 
which were unknown before. Because of this, participants of these studies did not find 
it very useful as a method of evaluation. More interesting results were achieved with 
the CD- VML methodology, which integrates the cognitive dimensions with the theory 
of visual modelling languages. 



2 CD- VML Methodology 

The approach to customisation of the original CD questionnaire is presented in Fig. 1. 
In the first stage, redefined cognitive dimensions were integrated with the 
terminology of VML. The VML definition covered the fundamentals of the visual 
modelling language understanding with distinguishing between its syntax, semantics, 
pragmatics and notation. The result of this transformation was the universal CD-VML 
template, which shifted the CD questionnaire from the general area of the notational 
systems to the area of VML. 

In the second stage, questionnaires for several VMLs in their specific contexts of 
use can be developed. Each CD- VML -XY- FT questionnaire can be generated on the 
basis of the CD-VML template as well as the description of VML ( XX) and its context 
of use (YY). In order to facilitate focusing the responders' attention they contain tables 
with the relevant elements of concrete visual modelling language to be considered. 
The resulting questionnaire serves as possibly efficient tool of assessing all 
combinations of the aspects of evaluation and VML elements. 




Fig. 1. Approach to customisation of the CD questionnaire for VML 
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3 Empirical Studies with CD-VML-UC Questionnaire 

The idea of the verification of the methodology was to try out an example: evaluation 
of use case diagrams with CD-VML-UC questionnaire in practice. Forty-five good 
students were given the CD-VML-UC questionnaire and asked to fill it in. 
Additionally they filled in another questionnaire, which allowed them to comment on 
the results achieved when filling the CD-VML-UC questionnaire and to evaluate its 
usefulness for the VML evaluation. 

The results of the study are satisfactory. The students discovered and explained all 
problems reported in the literature about use cases [1,4,5] and indicated some new 
ones. The strengths of the methodology are: the precision of questions, large area of 
covered issues and the ease of use. The weaknesses are: too large length of the 
questionnaire and redundancy in the sense of similar questions in different parts of the 
questionnaire or at least similar answers to them. Most participants discovered 
something new about different aspects of use cases or modelling. The usefulness of 
the questionnaire was evaluated as high. In conclusion, the CD-VML methodology 
satisfies the criteria of effectiveness and flexibility and only partially the criterion of 
efficiency. The results came out of the integration of the theory of visual modelling 
languages with the cognitive dimensions. The theory of visual modelling languages 
delivered a precise model of the VML and the cognitive dimensions delivered a set of 
criteria related to the cognitive aspects. 
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Abstract. This builds on previous work in which we have developed 
diagramming principles based on theories of structural object perception. We 
call these geon diagrams. We have previously shown that such diagrams are 
easy to remember and to analyze. To evaluate our hypothesis that geon 
diagrams should also be easy to understand we carried out an empirical study to 
evaluate the learnability of geon diagram semantics in comparison with the 
well-established UML convention. The results support our theory of 

learnability. Both “novices” and “experts” found the geon diagram syntax 
easier to apply in a diagram-to-textual description matching task than the 
equivalent UML syntax. 



1 Introduction 

Conceptualizing the design of a system is an important element of the entire software 
development process. This activity is supported by the use of sketches and diagrams 
to capture various aspects of the system being modeled. Many forms of diagrams 
have been developed for modeling software engineering problems such as those 
available through the Unified Modeling Language (UML) [6]. Although these 
diagrams are general and complete, the choice of graphical notations appear to be 
somewhat arbitrary so that only an expert in the field can easily learn them. 

To some extent, learning and using software engineering semantics is analogous to 
learning semantics in a natural language. Chomsky's theory that language 
understanding is based on innate deep cognitive structures is now widely, if not 
universally, held [2], It has also been argued that there is a similar deep structure in 
vision, although the purpose of this structure is not communication but perception of 
the environment. The perceptual theory of Marr contains visual primitives such as 
“blobs”, “bars”, and “terminations” [5]. These are interpreted according to a visual 
syntax thereby enabling us to understand 3D structured objects [1]. Jackendoff [4] 
argues that the rules of visual structure are similar to verbal language rules. He 
further proposes that there are cognitive “correspondence rules” between the visual 
meaning of a 3D structure and linguistic structure. This provides a natural link 
between visual structure and linguistic structures that may help explain why certain 
kinds of diagrams are easy to understand. 

In our previous work [3], we derived a set of "naturally" occurring “geon 
correspondence rules” (or GCRs) to map diagram semantics. Here we describe the 
subset relevant to software class structures: 
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Fig. 1. Representing some related entities in a system describing a Space Center 



GCR1: Inheritance - Geons with same shape can be used to denote is- a relationship. 
GCR2: Dependency - If geon A is on-top-of geon B this suggests that geon A is 
supported by geon B. 

GCR3: Aggregation - shows that Geon A is contained within Geon B, shown as an 
internal component attached to the same primitive geon on the outside. 

GCR4: Multiplicity - to show multiple associations between two entities a series of 
attachments can best denote such a relationship. 

Figure 1 illustrates an example of how these rules describe related entities of a 
Space Center. The Space Center has-many Buildings (containment with multiple 
connecting lines), and has-many Spacecraft. A Gas Station and a Lab are two 
different types of buildings (same shape primitive as Building). The Gas Station has 
Fuel (containment with connection). Shuttles are also a type of Spacecraft (same 
shape primitive). Shuttles have-many Wings, and has-one Engine. The Engine 
depends on Fuel (depicted on-top-of the Fuel entity). 

While our previous results showed that the geon correspondence rules were easier 
to recall and more intuitive than UML rules [3], they did not say anything directly 
about their ability to help people match a diagram to a problem domain. 

2 Experiment 

We hypothesized that it should be possible to learn diagram semantics, more 
accurately in diagrams created with the perceptual notation presented above in 
comparison with an equivalent UML graphical notation. Error rate was measured for 
matching diagrams to informal written descriptions of various real-world problem 
domains. 

2.1 Method 

Diagrams. Five problem descriptions incorporating the semantics of generalization, 
dependency, one-to-many, and aggregation were constructed. The semantics in the 
problems were clearly presented using their common terminology. For example, to 
describe related entities of a neighborhood (Fig. 2.) the following text was provided: 
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"The neighborhood depends on the city for clean water, sewage and garbage 
disposal. Many families live in this neighborhood. The neighborhood has a school 
and a pharmacy. It also has several types of stores: a grocery store, a convenience 
store, and a bakery. " Only one of the four diagrams accurately depicted the 
relationships in the corresponding problem description. The remaining three diagrams 
mis-represented several relationships. 

Training. Twenty-six paid volunteers were collectively given an hour-long 
instruction on the various semantics and their respective notations. The training 
included an introduction to object oriented modeling, a description of each semantic 
with its UML and geon diagram notation, and sample UML and geon diagrams of 
complete systems with objects and their relationships. The emphasis during the 
training was placed on the concept underlying each semantic. It was only during this 
training phase that subjects had access to viewing the notations. The subjects were 
asked to return a week later for the experiment. At the testing stage they were tested 
individually. 

Task. For this experiment we used a diagram-to-problem matching paradigm. After 
reading each problem description, the subject was asked to match one of the four 
diagrams created for that problem. The subject marked on the hand-out sheet the 
number of the matching diagram. The problem descriptions were available to the 
subjects while reading the diagrams, and so they could occasionally consult the 
description. Therefore we were not testing subject memory of a given problem text. 

Subjects were restricted to two minutes for matching a diagram to a problem 
description. A within-subject design was used where half the subjects matched the 
UML diagrams first and the other half matched the geon diagrams first. 

Twelve of the 26 subjects had previous exposure to UML (experts), the others had 
never been exposed to UML (novices). 

2.2 Results 

The results are obtained by averaging each subject's scores. Overall subjects matched 
the informal problem descriptions to Geon diagrams with an error of 14.6% vs. 
36.2% with the UML diagrams. A One-Sample T-Test (or Sign Test) statistically 
shows that subjects performed better with the geon diagrams (p < 0.0001). 
Combining the results we can say that there were more than twice as many errors in 
analyzing and matching the UML diagrams than the Geon diagrams. 
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Fig. 2. Sample UML and equivalent Geon diagram for representing entities of a neighborhood 



2.3 Discussion 

The results obtained from the experiment described here, show that the mapping of 
specific software engineering semantics (inheritance, dependency, etc.) onto “geon 
correspondence rules” can be used as guidelines for making effective diagrams. In 
particular we see that the geon diagrams are well suited for learning a subset of 
object-oriented concepts such as those necessary for modeling software class 
structures. In comparing the leamability of matching problems to diagrams, we found 
that subjects, regardless of their experience in software modeling, were capable of 
learning and interpreting the perceptual syntax with fewer errors. The results were 
particularly significant in showing that with very little training, experts (subjects 
experienced only with UML diagrams and semantics through a software engineering 
course at the university) performed better with the geon diagrams. The use of a 
diagrammatic notation that requires minimal training may be particularly useful in 
instances where end users are involved in the development process and therefore need 
to quickly learn the diagrammatic notations. The experiment described in this paper 
focused on only one aspect of learning, i.e. matching problem descriptions to 
diagrams. While this constitutes a justifiable starting point for this line of research, 
further experimentation needs to be conducted in order to determine whether the geon 
notation can facilitate the process of software modeling by allowing the user to create 
proper abstractions of a problem. 
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